Configure the Number of Parallel Reduce Tasks
The reducer reads map results from every node, copying map outputs to its own buffers, sorting, and spilling when the buffer gets full. The
mapred.reduce.parallel.copies parameter in the mapred-site.xml file controls how many map files the reducer can read in parallel. The default is 12. In most cases, it does not make sense to adjust this parameter. However, you can reduce it in very large clusters in order to throttle the network bandwidth used during the reduce phase if necessary.
Configure the Percentage of Completed Map Tasks Before Reduce Tasks Can start
In some cases, you can set the reducer to start before all map tasks are complete. The
mapred.reduce.slowstart.completed.maps parameter controls the percentage of map tasks that must be complete before the reducers start. If the value of the
mapred.reduce.slowstart.completed.maps parameter is set too low, random disk I/O results and performance will suffer.
If the output of the map tasks is large, set this to 0.95 to account for the overhead of starting the reducers. If the output of map tasks is small, you can lower this value. However, if you have a very large cluster or only one wave of map tasks, you may want to set this parameter to 0.