Why Hadoop is not implemented using MPI

Question

What are the technical reasons why hadoop does not use MPI for communication between different nodes.?

I could hazard a few guesses, but I do not know enough of how MPI is implemented "under the hood" to know whether or not I'm right.

Come to think of it, I'm not entirely familiar with Hadoop's internals either. I understand the framework at a conceptual level (map/combine/shuffle/reduce and how that works at a high level) but I don't know the nitty gritty implementation details. I've always assumed Hadoop was transmitting serialized data structures (perhaps GPBs) over a TCP connection, eg during the shuffle phase. Let me know if that's not true.

Frankie · Answer 1 · Dec 4, 2018

One of the big features of Hadoop/map-reduce is the fault tolerance. Fault tolerance is not supported in most of the current MPI implementations. It is being thought about for future versions of OpenMPI.