How to create smaller table from big table in HIVE

0 votes

I currently have a Hive table that has 1.5 billion rows. I would like to create a smaller table (using the same table schema) with about 1 million rows from the original table. Ideally, the new rows would be randomly sampled from the original table, but getting the top 1M or bottom 1M of the original table would be ok, too. How would I do this?

Sep 24, 2018 in Big Data Hadoop by slayer
• 29,370 points
1,736 views

1 answer to this question.

0 votes

You could probably best use Hive's built-in sampling methods.

INSERT OVERWRITE TABLE my_table_sample 
SELECT * FROM my_table 
TABLESAMPLE (1m ROWS) t;

This syntax was introduced in Hive 0.11. If you are running an older version of Hive, you'll be confined to using the PERCENT syntax like so.

INSERT OVERWRITE TABLE my_table_sample 
SELECT * FROM my_table 
TABLESAMPLE (1 PERCENT) t;
answered Sep 24, 2018 by digger
• 26,740 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to create a Hive table from sequence file stored in HDFS?

There are two SerDe for SequenceFile as ...READ MORE

answered Dec 18, 2018 in Big Data Hadoop by Omkar
• 69,220 points
4,993 views
0 votes
1 answer

How to create a parquet table in hive and store data in it from a hive table?

Please use the code attached below for ...READ MORE

answered Jan 28, 2019 in Big Data Hadoop by Omkar
• 69,220 points
18,669 views
0 votes
1 answer

How to create a managed table in Hive?

You can use this command: create table employee(Name ...READ MORE

answered Dec 14, 2018 in Big Data Hadoop by Omkar
• 69,220 points
5,389 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
8,520 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,832 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,612 views
0 votes
1 answer
+2 votes
5 answers

How to transpose/pivot data in hive?

Below is also a way for Pivot SELECT ...READ MORE

answered Oct 12, 2018 in Big Data Hadoop by Rahul
19,979 views
0 votes
1 answer

How to execute python script in hadoop file system (hdfs)?

If you are simply looking to distribute ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by digger
• 26,740 points
13,553 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP