How to limit the number of rows per each item in a Hive QL

+1 vote

Say I have multiple items listed in a where clause.
How do I limit to N for each item in the list?

EX:

select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000
Dec 1, 2018 in Big Data Hadoop by slayer
• 29,370 points
27,005 views

1 answer to this question.

+1 vote
SELECT a_id, b, c, count(*) as sumrequests
FROM (
    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
    FROM table_name
    ) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;

Try the above code and This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id.

answered Dec 1, 2018 by Omkar
• 69,220 points

One doubt here for the subquery if you can answer

SELECT a_id, b, c, row_number() over (Partition BY a_id) as row FROM table_name

This inner query will be executed for all the rows for each request.
For example if user needs data from 50th row  for one request, next user need to see from 100 th row (concept of pagination) so inner query will be executed for each request. Can that be a performance issue.

Yes @Debapriya, this query will be executed for each request and it may cause performance issue but in such cases we need to choose between time and space. If you want to decrease space complexity(if this query needs to be executed frequently), one way to do this it is by creating another sub-table of the result and then get result from that data. But this will occupy space.

Related Questions In Big Data Hadoop

0 votes
2 answers

How to change the location of a table in hive?

Changing location requires 2 steps: 1.) Change location ...READ MORE

answered Feb 12, 2020 in Big Data Hadoop by Saksham Sehrawet
9,070 views
0 votes
1 answer

How to see the content of a table in hive?

Hello, If you want to see the content ...READ MORE

answered May 15, 2019 in Big Data Hadoop by Gitika
• 65,770 points
5,192 views
0 votes
1 answer

How to Modify the Maximum Number of Versions for a Column Family in Hbase?

Hey, The example uses HBase Shell to keep ...READ MORE

answered May 31, 2019 in Big Data Hadoop by Gitika
• 65,770 points
3,607 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,028 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,536 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,832 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,612 views
+1 vote
1 answer

How to count number of rows in alias in PIG?

COUNT is part of pig LOGS= LOAD 'log'; LOGS_GROUP= ...READ MORE

answered Oct 15, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,812 views
0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 69,220 points
8,947 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP