Beautiful Code [264]
Distributed Programming with MapReduce > Appendix: Word Count Solution
23.9. Appendix: Word Count Solution
This section contains the full C++ implementation of the word frequency counting example that was used in the early part of this chapter. The code can also be found on the O'Reilly web site for this book (http://www.oreilly.com/catalog/9780596510046):
Code View: Scroll / Show All
#include "mapreduce/mapreduce.h"
// User's map function
class WordCounter : public Mapper {
public:
virtual void Map(const MapInput& input) {
const string& text = input.value();
const int n = text.size();
for (int i = 0; i < n; ) {
// Skip past leading whitespace
while ((i < n) && isspace(text[i]))
i++;
// Find word end
int start = i;
while ((i < n) && !isspace(text[i]))
i++;
if (start < i)
EmitIntermediate(text.substr(start,i-start),"1");
}
}
};
REGISTER_MAPPER(WordCounter);
// User's reduce function
class Adder : public Reducer {
virtual void Reduce(ReduceInput* input) {
// Iterate over all entries with the
// same key and add the values
int64 value = 0;
while (!input->done()) {
value += StringToInt(input->value());
input->NextValue();
}
// Emit sum for input->key()
Emit(IntToString(value));
}
};
REGISTER_REDUCER(Adder);
int main(int argc, char** argv) {
ParseCommandLineFlags(argc, argv);
MapReduceSpecification spec;
// Store list of input files into "spec"
for (int i = 1; i < argc; i++) {
MapReduceInput* input = spec.add_input();
input->set_format("text");
input->set_filepattern(argv[i]);
input->set_mapper_class("WordCounter");
}
// Specify the output files:
// /gfs/test/freq-00000-of-00100
// /gfs/test/freq-00001-of-00100
// ...
MapReduceOutput* out = spec.output();
out->set_filebase("/gfs/test/freq");
out->set_num_tasks(100);
out->set_format("text");
out->set_reducer_class("Adder");
// Optional: do partial sums within map
// tasks to save network bandwidth
out->set_combiner_class("Adder");
// Tuning parameters: use at most 2,000
// machines and 100 MB of memory per task
spec.set_machines(2000);
spec.set_map_megabytes(100);
spec.set_reduce_megabytes(100);
// Now run it
MapReduceResult result;
if (!MapReduce(spec, &result)) abort( );
// Done: 'result' structure contains info
// about counters, time taken, number of
// machines used, etc.
return 0;
}
Beautiful Concurrency > A Simple Example: Bank Accounts
24. Beautiful Concurrency
Simon Peyton Jones
The free lunch is over.[*] We have grown used to the idea that our programs will go faster when we buy a next-generation processor, but that time has passed. While that next-generation chip will have more CPUs, each individual CPU will be no faster than the previous year's model. If we want our programs to run faster, we must learn to write parallel programs.[]
[*] Herb Sutter, "The free lunch is over: a fundamental turn toward concurrency in software," Dr. Dobb's Journal, March 2005.
[] Herb Sutter and James Larus, "Software and the concurrency revolution," ACM Queue, Vol. 3, No. 7, September 2005.
Parallel programs execute in a nondeterministic way, so they are hard to test, and bugs can be almost impossible to reproduce. For me, a beautiful program is one that is so simple and elegant that it obviously has no mistakes, rather than merely having no obvious mistakes.[] If we want to write parallel programs that work reliably, we must pay particular attention to beauty. Sadly, parallel programs are often less beautiful than their sequential cousins; in particular they are, as we shall see, less modular.
[] This turn of phrase is due to Tony Hoare.
In this chapter, I'll describe Software Transactional Memory (STM), a promising new approach to programming shared-memory parallel processors that seems to support modular programs in a way that current technology does not. By the time we are done, I hope you will be as