Online Book Reader

Home Category

Beautiful Code [261]

By Root 5165 0
machines by automatically partitioning the input data into a set of M splits. The input splits can be processed in parallel by different machines. Reduce invocations are distributed by partitioning the intermediate key space into R pieces using a partitioning function (e.g., hash(key) % R).

Figure 23-1 shows the actions that occur when the user program calls the MapReduce function (the numbered labels in Figure 23-1 correspond to the numbers in the following list).

Figure 23-1. Relationships between processes in MapReduce

The MapReduce library first splits the input files into M pieces (typically 16 megabytes to 64 megabytes per piece). It then starts up many copies of the program on a cluster of machines, by making a request to the cluster scheduling system.

One of the copies is special and is called the MapReduce master. The remaining tasks are assigned chunks of Map and Reduce work by the master. There are M map tasks and R reduce tasks. The master picks idle workers and assigns a map and/or a reduce task to each.

A worker that is assigned a map task reads the contents of the corresponding input split. It passes each input record to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory.

Periodically, the buffered pairs are written to local disk, partitioned into R separate buckets by the partitioning function. When the map task is completed, the worker notifies the master. The master forwards information about the location of the intermediate data generated by this map task to any workers that have been assigned reduce tasks. If there are remaining map tasks, the master assigns one of the remaining tasks to the newly idle worker.

When a reduce worker is told the locations of intermediate data for its reduce task, it issues remote procedure calls to read the buffered intermediate data from the local disk of the map workers. When a reduce worker has finished reading all intermediate data for its reduce task, it sorts it by the intermediate keys so that all occurrences of the same intermediate key are grouped together. If the intermediate data is too large to fit in memory on the reduce worker, an external sort is used.

The reduce worker iterates over the sorted intermediate key/value pairs. For each unique intermediate key encountered, it passes the key and the corresponding list of intermediate values to the user's Reduce function. Any key/value pairs generated by the user's Reduce function are appended to a final output file for this reduce partition. When the reduce task is done, the worker notifies the master. If there are remaining reduce tasks, the master assigns one of the remaining reduce tasks to the newly idle worker.

When all map tasks and reduce tasks have been completed, the MapReduce function call in the user program returns, giving control back to the user code. At this point, the output of the MapReduce job is available in the R output files (one file per reduce task).

Several details of the implementation allow it to perform well in our environment.

Load balancing

A MapReduce job typically has many more tasks than machines, which means that each worker will be assigned many different tasks by the master. The master assigns a new task to a machine when it finishes its previous task. This means that a faster machine will be assigned more tasks than a slower machine. Therefore, the assignment of tasks to machine is properly balanced even in a heterogeneous environment, and workers tend to be kept busy with useful work throughout the computation.

Fault tolerance

Because this implementation of MapReduce is designed to run jobs distributed across hundreds or thousands of machines, the library must transparently handle machine failures.

The master keeps state about which map and reduce tasks have been done by which workers. The master periodically sends a ping remote procedure call to each worker. If a worker does not respond to several consecutive pings, the master declares that worker as dead and assigns any

Return Main Page Previous Page Next Page

®Online Book Reader