HadoopConceptsNote

Shuffle and Sort Phase

The Shuffle Phase is heart of MapReduce program and is where the magic happens.

The shuffle is also the area of codebase where the refinements and improvement are continuously been made.

shuffle and sort

On the Map side.

  • Before the Map output is produced, there is a buffer in memory, it's default size is 100 MB.
  • The size of buffer can be tuned by changing the property of mapreduce.task.io.sort.mb
  • When the content of buffer reaches a certain threshold size, 80% by default, a background thread would start to spill the content to disk
  • The threshold could be tuned by changing the property of mapreduce.map.sort.spill.percent
  • Each Partition is corresponding to the one Reduce task.
  • Within every partition, the background thread performs an in-memory sort by key.
  • The Combiner function is run on the output of sort.
  • The Spills would be merged into one partitioned and sorted file before Map task finished.
  • The first partition would be sent to first Reduce task and so on.
  • Compress Map output could reduce the amount of data transfer to the Reducer. To enable compression, set the property mapreduce.map.output.compress to true.
  • The compression library is specified by mapreduce.map.output.compress.codec

The Reduce Side

  • Copy phase
  • The default number of threads fetch Map output is 5, and can be changed by setting mapreduce.reduce.shuffle.parallelcopies property.
  • The Map output are copied to Reduce task JVM memory if they are small enough.
  • The buffer's size is controlled by mapreduce.reduce.shuffle.input.buffer.per cent, which specific the propertion of heap to use for this purpose.
  • The property mapreduce.reduce.shuffle.merge.percent, means the threshold of in memory buffer size.
  • mapreduce.reduce.merge.inmem.threshold, means threshold number of Map outputs.
  • When the in memory buffer reaches the threshold, or reaches the threshold number of map outputs, it is merged and spilled to disk.