Usually we have Map/Reduce pair written in java..a map which splits the dataset into independent chunks, and a reduce which combines the results to perform some useful analysis...Hadoop streaming is a utility which allows us to write Map/Reduce applications in any language(like Ruby/Python/Bash etc.) that is capable of working with STDIN(for input) and STDOUT(for output)!