How can Hadoop process the records that are split across the block boundaries

0 votes
Assume a record line is split between two blocks (b1 and b2). The mapper is processing the first block (b1) comes to know that, at the last line, there is no EOL separator and fetches the remaining of the line from the next block of data (b2).

How does the mapper processing the second block (b2) understands that the first record is incomplete and should process starting from the second record in the block (b2)?
Apr 15, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points
4,624 views

1 answer to this question.

0 votes

First of all, Map Reduce algorithm is not programmed to work on physical memory blocks of the file. It is designed to work on the logical input splits. Each file or data you enter into HDFS splits into a default memory sized block. Input split block size depends on the memory location where the record was written. A record can extend to two Mappers.

HDFS was designed in such a way that it divides files into blocks measuring 128MB each by default and replicates the data before storing the default replication factor is three. Then these blocks are transferred to different nodes in the Hadoop cluster.

HDFS has no regard for the data present in those files. A file can start in A-Block and end of that file can be present in B-Block.

To solve this problem, Hadoop uses a logical representation of the data stored in file blocks, known as input splits. When a MapReduce job is assigned from the client, it calculates the total number of input splits, it understands where the first record in a block starts and where the last record in the block finishes.

In cases where the last record in a block is incomplete, the input split includes location information for the next block and the byte offset of the data needed to complete the record.

Image result for mapreduce tutorial

answered Apr 15, 2019 by nitinrawat895
• 11,380 points

Related Questions In Big Data Hadoop

+1 vote
1 answer

How does Hadoop process records split across block boundaries?

Interesting question, I spent some time looking ...READ MORE

answered Dec 7, 2020 in Big Data Hadoop by Gitika
• 65,730 points
1,319 views
0 votes
1 answer

How does Hadoop process data which is split across multiple boundaries in an HDFS?

I found some comments: from the Hadoop ...READ MORE

answered Jul 1, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
1,458 views
0 votes
1 answer
0 votes
1 answer

How does Hadoop accesses the files which are distributed among different boundaries?

Hadoop's MapReduce function does not work on ...READ MORE

answered May 7, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
1,165 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
13,568 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
4,462 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
116,600 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
6,633 views
0 votes
1 answer

How to analyze block placement on datanodes and rebalancing data across Hadoop nodes?

HDFS provides a tool for administrators i.e. ...READ MORE

answered Jun 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,539 views
+1 vote
1 answer

How can I get the Hadoop Documentation for its particular version?

Hi, You can download all the versions you ...READ MORE

answered Mar 19, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,237 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP