Merging Files using PIG

0 votes
How do I merge multiple small .txt files stored in an HDFS Location into a single File using PIG?
Jul 9, 2019 in Big Data Hadoop by Ritu
648 views

1 answer to this question.

0 votes

In order to merge two or more files into one single file and store it in hdfs, you need to have a folder in the hdfs path containing the files that you want to merge.

Here, I am having a folder namely merge_files which contains the following files that I want to mergeimage

Then you can execute the following command to the merge the files and store it in hdfs:

hadoop fs -cat /user/edureka_425640/merge_files/* | hadoop fs -put - /user/edureka_425640/merged_file s

The merged_files folder need not be created manually. It is going to be created automatically to store your output when you are using the above command. You can view your output using the following command. Here my merged_files is storing my output result.

hadoop fs -cat merged_files

Please refer to the below screenshot:

image

answered Jul 9, 2019 by Tina

Related Questions In Big Data Hadoop

0 votes
1 answer

Merging Files using PIG

In order to merge two or more ...READ MORE

answered Jul 4, 2019 in Big Data Hadoop by Tina
1,599 views
0 votes
1 answer

Moving files in Hadoop using the Java API?

I would recommend you to use FileSystem.rename(). ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,490 points
2,669 views
0 votes
1 answer

How do I join 2 tables in PIG using 2 fields?

Here, we have two tables: Tab1 having columns ...READ MORE

answered Dec 13, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,139 views
0 votes
1 answer

Using HBase jar files. Where to download and set classpath

There are many sites you can get ...READ MORE

answered Dec 14, 2018 in Big Data Hadoop by Omkar
• 69,220 points
3,070 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,029 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,536 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,832 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,612 views
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
2,315 views
0 votes
2 answers

How to see MySql service is running or not using linux command?

Hi, You can simply run the following commands ...READ MORE

answered Jan 21, 2020 in Big Data Hadoop by anonymous
2,254 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP