How to create a Hive table from sequence file stored in HDFS

0 votes

What i need is:
I have sequence stored in HDFS, I have to create the table for that sequence file. what is the serde used here?

Dec 17, 2018 in Big Data Hadoop by digger
• 26,740 points
5,072 views

1 answer to this question.

0 votes

There are two SerDe for SequenceFile as follows:

TextSerializerDeserializer: This class can read and write data in plain text file format.

BinarySerializerDeserializer: This class can read and write data in binary file format.

The default is the SerDe for plain text file in Tajo. The above example statement created the table using TextSerializerDeserializer.If you want to use BinarySerializerDeserializer, you can specify it by sequencefile.serde keywords:

CREATE TABLE tablename (id int, name text, score float, type text)
USING sequencefile with ('sequencefile.serde'='org.apache.tajo.storage.BinarySerializerDeserializer')

In Hive, the above statement can be written in Hive as follows:

CREATE TABLE tablename (id int, name string, score float, type string)
ROW FORMAT SERDE
 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe'
STORED AS sequencefile;

Writer

There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs:

Writer : Uncompressed records.

RecordCompressWriter : Record-compressed files, only compress values.

BlockCompressWriter : Block-compressed files, both keys & values are collected in ‘blocks’ separately and compressed. The size of the ‘block’ is configurable.

The default is Uncompressed Writer in Tajo. If you want to use RecordCompressWriter, you can specify it by compression.type keywords and compression.codec keywords:

CREATE TABLE tablename (id int, name text, score float,type text)
USING sequencefile with ('compression.type'='RECORD','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')

In hive, you need to specify settings as follows:

hive> SET hive.exec.compress.output = true;
hive> SET mapred.output.compression.type = RECORD;
hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
hive> CREATE TABLE tablename (id int, name string, score float, type string) STORED AS sequencefile;

And if you want to use BlockCompressWriter, you can specify it by compression.type keywords and compression.codec keywords:

CREATE TABLE tablename (id int, name text, score float, type text)
USING sequencefile with ('compression.type'='BLOCK','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')

In hive, you need to specify settings as follows:

hive> SET hive.exec.compress.output = true;
hive> SET mapred.output.compression.type = BLOCK;
hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
hive> CREATE TABLE tablename (id int, name string, score float, type string) STORED AS sequencefile;;

For reference, you can use TextSerDe or BinarySerDe with compression keywords. Here is an example statement for this case.

CREATE TABLE tablename (id int, name text, score float, type text)
USING sequencefile with ('sequencefile.serde'='org.apache.tajo.storage.BinarySerializerDeserializer', 'compression.type'='BLOCK','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')

In hive, you need to specify settings as follows:

hive> SET hive.exec.compress.output = true;
hive> SET mapred.output.compression.type = BLOCK;
hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
hive> CREATE TABLE tablename (id int, name string, score float, type string)
      ROW FORMAT SERDE
        'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe'
      STORED AS sequencefile;
answered Dec 18, 2018 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to create smaller table from big table in HIVE?

You could probably best use Hive's built-in sampling ...READ MORE

answered Sep 24, 2018 in Big Data Hadoop by digger
• 26,740 points
1,813 views
0 votes
1 answer

Not able to create Hive table from HDFS file

You dont have to specify the file name ...READ MORE

answered Dec 5, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,574 views
0 votes
1 answer

How to unzip a zipped file stored in Hadoop hdfs?

hadoop fs -text /hdfs-path-to-zipped-file.gz | hadoop fs ...READ MORE

answered Dec 12, 2018 in Big Data Hadoop by Omkar
• 69,220 points
12,962 views
0 votes
1 answer

How to create a managed table in Hive?

You can use this command: create table employee(Name ...READ MORE

answered Dec 14, 2018 in Big Data Hadoop by Omkar
• 69,220 points
5,459 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,152 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,644 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,425 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,675 views
0 votes
1 answer

How to create a Hive table with a sequence file?

In Hive we can create a sequence ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 69,220 points
3,664 views
0 votes
1 answer

How to create a parquet table in hive and store data in it from a hive table?

Please use the code attached below for ...READ MORE

answered Jan 28, 2019 in Big Data Hadoop by Omkar
• 69,220 points
18,825 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP