Why Apache Pig is used instead of Hadoop

0 votes
I have understand that Pig works on top of Apache Hadoop and it uses its own language called as Pig Latin. But, I am confused about why Pig what developed in the first place. In other words, I would like to know what are those features that Pig provides which was not provided by Apache Hadoop? Why should I go ahead with an overhead of learning a new language - Pig Latin?
May 8, 2018 in Big Data Hadoop by Meci Matt
• 9,460 points
2,513 views

1 answer to this question.

0 votes
As you know writing mapreduce programs in Java or any other language is quite complex. You may have to write a 100 lines of java code for doing a simple sort. Also, in a company, you have many people (analyst) who are quite comfortable with writing SQL like queries and therefore, wanted similar kind of functionalities out of the box from Hadoop. Now, following are the features that Pig provides:
  • Pig Latin is a high-level data flow language, whereas MapReduce is a low-level data processing paradigm.
  • Without writing complex Java implementations in MapReduce, programmers can achieve the same implementations very easily using Pig Latin.
  • Apache Pig uses multi-query approach (i.e. using a single query of Pig Latin we can accomplish multiple MapReduce tasks), which reduces the length of the code by 20 times. Hence, this reduces the development period by almost 16 times.
  • Pig provides many built-in operators to support data operations like joins, filters, ordering, sorting etc. Whereas to perform the same function in MapReduce is a humongous task.
  • In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce. I will explain you these data types in a while.
Basically, Pig provides an abstraction to avoid the complexity of writing MapReduce programming, providing various query operations like joins, group by, etc. out of the box. This makes life of a data engineer easier for managing and performing different ad hoc queries on the data.

 

Pig is a high-level platform for creating MapReduce programs used with Hadoop.The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python or JavaScript and then call directly from the language.

Now, the keywords above are high-level and abstracts . The way we have DBAs who can create/manage databases without knowledge of any major programming language but for SQL, similarly we can have data-engineers creating/managing data-pipelines/warehouses using Pig without getting into the complexities of how/what is being implemented/executed as hadoop jobs. So, to answer your question, Pig is not there to complement Hadoop in any of the features it is lacking, but it is just a high-level framework built on top hadoop to do things faster(development time).

You can certainly do everything what Pig does with Hadoop, but try out some advanced features of Pig and writing hadoop jobs for them will take some real good time. So, speaking very liberally, some of the tasks which are generic/common throughout data engineering have been implemented in bare hadoop before-hand in form of Pig, you just need to tell it in Pig-Latin to be executed.

answered May 8, 2018 by Ashish
• 2,650 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Why Java Code in Hadoop uses own Data Types instead of basic Data types?

Hadoop provides us Writable interface based data ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
1,196 views
+1 vote
1 answer

Why is jar file required to execute a MR code instead of class file?

We use hadoop keyword to invoke the ...READ MORE

answered Apr 24, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,291 views
0 votes
1 answer

Why tuple keywords is used in pig?

Hey, A tuple is a set of field, ...READ MORE

answered May 6, 2019 in Big Data Hadoop by Gitika
• 65,770 points
741 views
0 votes
1 answer

Why JOIN operator is used in pig?

Hey, In Apache pig, JOIN operator is used ...READ MORE

answered May 7, 2019 in Big Data Hadoop by Gitika
• 65,770 points
1,154 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,152 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,427 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,675 views
0 votes
1 answer
0 votes
1 answer

What Distributed Cache is actually used for in Hadoop?

Basically distributed cache allows you to cache ...READ MORE

answered Apr 3, 2018 in Big Data Hadoop by Ashish
• 2,650 points
2,241 views
0 votes
1 answer

What is the use of sequence file in Hadoop?

Sequence files are binary files containing serialized ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by Ashish
• 2,650 points
9,644 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP