Shuffle and Sort Phase
The Shuffle Phase is heart of MapReduce program and is where the magic happens.
The shuffle is also the area of codebase where the refinements and improvement are continuously been made.

On the Map side.
- Before the Map output is produced, there is a buffer in memory, it's default size is 100 MB.
- The size of buffer can be tuned by changing the property of
mapreduce.task.io.sort.mb
- When the content of buffer reaches a certain threshold size, 80% by default, a background thread would start to spill the content to disk
- The threshold could be tuned by changing the property of
mapreduce.map.sort.spill.percent
- Each Partition is corresponding to the one Reduce task.
- Within every partition, the background thread performs an in-memory sort by key.
- The Combiner function is run on the output of sort.
- The Spills would be merged into one partitioned and sorted file before Map task finished.
- The first partition would be sent to first Reduce task and so on.
- Compress Map output could reduce the amount of data transfer to the Reducer. To enable compression, set the property
mapreduce.map.output.compress
to true.
- The compression library is specified by mapreduce.map.output.compress.codec
The Reduce Side
- Copy phase
- The default number of threads fetch Map output is 5, and can be changed by setting
mapreduce.reduce.shuffle.parallelcopies
property.
- The Map output are copied to Reduce task JVM memory if they are small enough.
- The buffer's size is controlled by
mapreduce.reduce.shuffle.input.buffer.per
cent
, which specific the propertion of heap to use for this purpose.
- The property
mapreduce.reduce.shuffle.merge.percent
, means the threshold of in memory buffer size.
mapreduce.reduce.merge.inmem.threshold
, means threshold number of Map outputs.
- When the in memory buffer reaches the threshold, or reaches the threshold number of map outputs, it is merged and spilled to disk.