How to find in incorrect file records in hive

0 votes
Suppose 1000 records are present in one Json file and saving all records in HIVE Table. In that records one record is incorrect, how to find that error record?
Jul 25, 2019 in Big Data Hadoop by Robby
2,657 views

1 answer to this question.

0 votes

A value with a wrong datatype causes the generated MR job to crash. ignore.malformed.json does not seem to fix it.

Here is the sample data, mixed2.json

{"f1":"hello", "f2":7}

{"f1":"goodbye", "f2":8}

{"f1":"this", "f2":9}

{"f1":"that", "f2":"ten"}

Here is the sample Hive script, mixed2.hive. The first query (on f1) works. The other queries (on * and f2) crash. It would be nice to see NULL or something else. The get_json_object() function actually returns the bad string, so it prints "ten"!

drop table mixed2;

create table mixed2 (f1 string, f2 int)

row format serde 'org.openx.data.jsonserde.JsonSerDe'

with serdeproperties ("ignore.malformed.json" = "true")

stored as textfile;


load data inpath '/tmp/mixed2.json' overwrite into table mixed2;


select f1 from mixed2;

select f2 from mixed2;

select * from mixed2;

You should declare then the column as "String" instead of int. The SerDe will be able to read the numbers into strings, then you can CAST them in hive.

Abnormalities upto some extent can be taken care of but if the schema entirely changes then we can't load data at all.

answered Jul 25, 2019 by Ritu

Related Questions In Big Data Hadoop

0 votes
1 answer

How to find the default database in Hive?

Yes, you can find out which database ...READ MORE

answered May 20, 2019 in Big Data Hadoop by Shiro
4,714 views
0 votes
1 answer

How to get the column name printed in a file along with the output in Hive?

Hi @Neethu, Regarding your query, I would suggest ...READ MORE

answered Jul 2, 2020 in Big Data Hadoop by Gitika
• 65,770 points
1,015 views
0 votes
1 answer

How to find the number of blocks for a file in Hadoop?

Hi@akhtar, You can use Hadoop file system command to ...READ MORE

answered Oct 13, 2020 in Big Data Hadoop by MD
• 95,460 points
2,271 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,217 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,536 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,832 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,612 views
0 votes
1 answer

How to create a Hive table from sequence file stored in HDFS?

There are two SerDe for SequenceFile as ...READ MORE

answered Dec 18, 2018 in Big Data Hadoop by Omkar
• 69,220 points
4,993 views
+1 vote
2 answers

How to find previous records from a data set in Pig??

Hi, You can use ToDate() and SubtractDuration() function ...READ MORE

answered Jan 23, 2020 in Big Data Hadoop by MD
• 95,460 points
1,424 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP