Use of Group ALL in Pig

Question

Can someone explain with an example how to use GROUP ALL in pig?

score 0 · Answer 1 · Jul 10, 2019

Suppose we have a data set as follows:

001,Rajiv,Reddy,21,9848022337,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

and we are loading this table as follows:

The record coming out of group all has the chararray literal 'all' as a key and the complete content as the value.That means you would have only one key for all the values which means only one reducer would process the complete file.Because grouping collects all records together with the same value for the key.

Group all is used to group a relation by all the columns as shown below.

grunt> group_all = GROUP student_details All;

Now, verify the content of the relation group_all as shown below.

grunt> Dump group_all;

(all,{(8,Bharathi,Nambiayar,24,9848022333,Chennai),(7,Komal,Nayak,24,9848022334 ,trivendram),

(6,Archana,Mishra,23,9848022335,Chennai),(5,Trupthi,Mohanthy,23,9848022336,Bhuw aneshwar),

(4,Preethi,Agarwal,21,9848022330,Pune),(3,Rajesh,Khanna,22,9848022339,Delhi),

(2,siddarth,Battacharya,22,9848022338,Kolkata),(1,Rajiv,Reddy,21,9848022337,Hyd erabad)})