Complexity_ A Guided Tour - Melanie Mitchell [130]
In figure 17.4, I have plotted the to-be-or-not-to-be word frequency as a function of rank. The shape of the plot indeed looks like a power law. If the text I had chosen had been larger, the graph would have looked even more power-law-ish.
Zipf analyzed large amounts of text in this way (without the help of computers!) and found that, given a large text, the frequency of a word is approximately proportional to the inverse of its rank (i.e., 1 /rank). This is a power law, with exponent −1. The second highest ranked word will appear about half as often as the first, the third about one-third as often, and so forth. This relation is now called Zipf’s law, and is perhaps the most famous of known power laws.
FIGURE 17.4. An illustration of Zipf’s law using Shakespeare’s “To be or not to be” monologue.
There have been many different explanations proposed for Zipf’s law. Zipf himself proposed that, on the one hand, people in general operate by a “Principle of Least Effort”: once a word has been used, it takes less effort to use it again for similar meanings than to come up with a different word. On the other hand, people want language to be unambiguous, which they can accomplish by using different words for similar but nonidentical meanings. Zipf showed mathematically that these two pressures working together could produce the observed power-law distribution.
In the 1950s, Benoit Mandelbrot, of fractal fame, had a somewhat different explanation, in terms of information content. Following Claude Shannon’s formulation of information theory (cf. chapter 3), Mandelbrot considered a word as a “message” being sent from a “source” who wants to maximize the amount of information while minimizing the cost of sending that information. For example, the words feline and cat mean the same thing, but the latter, being shorter, costs less (or takes less energy) to transmit. Mandelbrot showed that if the information content and transmission costs are simultaneously optimized, the result is Zipf’s law.
At about the same time, Herbert Simon proposed yet another explanation, presaging the notion of preferential attachment. Simon envisioned a person adding words one at a time to a text. He proposed that at any time, the probability of that person reusing a word is proportional to that word’s current frequency in the text. All words that have not yet appeared have the same, nonzero probability of being added. Simon showed that this process results in text that follows Zipf’s law.
Evidently Mandelbrot and Simon had a rather heated argument (via dueling letters to the journal Information and Control) about whose explanation was correct.
Finally, also around the same time, to everyone’s amusement or chagrin, the psychologist George Miller showed, using simple probability theory, that the text generated by monkeys typing randomly on a keyboard, ending a word every time they (randomly) hit the space bar, will follow Zipf’s law as well.
The many explanations of Zipf’s law proposed in the 1930s through the 1950s epitomize the arguments going on at present concerning the physical or informational mechanisms giving rise to power laws in nature. Understanding power-law distributions, their origins, their significance, and their commonalities across disciplines is currently a very important open problem in many areas of complex systems research. It is an issue I’m sure you will hear more about as the science behind these laws becomes clearer.
CHAPTER 18
Evolution, Complexified
IN CHAPTER I I asked, “How did evolution produce creatures with such an enormous contrast between their individual simplicity and their collective sophistication?” Indeed, as illustrated by the examples we’ve seen in this book, the closer one looks at living systems, the more astonishing it seems that such intricate complexity could have been formed by the gradual accumulation of favorable mutations or the whims of historical accident. This very argument