Chaos - James Gleick [120]
—for example, using some bits as checks on others. It put teeth into the crucial notion of “redundancy.” In terms of Shannon’s information theory, ordinary language contains greater than fifty percent redundancy in the form of sounds or letters that are not strictly necessary to conveying a message. This is a familiar idea; ordinary communication in a world of mumblers and typographical errors depends on redundancy. The famous advertisement for shorthand training—if u cn rd ths msg…—illustrated the point, and information theory allowed it to be measured. Redundancy is a predictable departure from the random. Part of the redundancy in ordinary language lies in its meaning, and that part is hard to quantify, depending as it does on people’s shared knowledge of their language and the world. This is the part that allows people to solve crossword puzzles or fill in the missing word at the end of a. But other kinds of redundancy lend themselves more easily to numerical measures. Statistically, the likelihood that any letter in English will be “e” is far greater than one in twenty-six. Furthermore, letters do not have to be counted as isolated units. Knowing that one letter in an English text is “t” helps in predicting that the next might be “h” or “o,” and knowing two letters helps even more, and so on. The statistical tendency of various two– and three-letter combinations to turn up in a language goes a long way toward capturing some characteristic essence of the language. A computer guided only by the relative likelihood of the possible sequences of three letters can produce an otherwise random stream of nonsense that is recognizably English nonsense. Cryptologists have long made use of such statistical patterns in breaking simple codes. Communications engineers now use them in devising techniques to compress data, removing the redundancy to save space on a transmission line or storage disk. To Shannon, the right way to look at such patterns was this: a stream of data in ordinary language is less than random; each new bit is partly constrained by the bits before; thus each new bit carries somewhat less than a bit’s worth of real information. There was a hint of paradox floating in this formulation. The more random a data stream, the more information would be conveyed by each new bit.
Beyond its technical aptness to the beginning of the computer era, Shannon information theory gained a modest philosophical stature, and a surprising part of the theory’s appeal to people beyond Shannon’s field could be attributed to the choice of a single word: entropy. As Warren Weaver put it in a classic exposition of information theory, “When one meets the concept of entropy in communication theory, he has a right to be rather excited—a right to suspect that one has hold of something that may turn out to be basic and important.” The concept of entropy comes from thermodynamics, where it serves as an adjunct of the Second Law, the inexorable tendency of the universe, and any isolated system in it, to slide toward a state of increasing disorder. Divide a swimming pool in half with some barrier; fill one half with water and one with ink; wait for all to be still; lift the barrier; simply through the random motion of molecules, eventually the ink and water will mix. The mixing never reverses itself, even if you wait till the end of the universe, which is why the Second Law is so often said to be the part of physics that makes time a one-way street. Entropy is the name for the quality of systems that increases under the Second Law—mixing, disorder, randomness. The concept is easier to grasp intuitively than to measure in any real-life situation. What would be a reliable test for the level of mixing of two substances? One could imagine counting the molecules of each in some sample. But what if they were arranged yes-no–yes-no–yes-no–yes-no? Entropy could hardly be described as high. One could count just the even molecules, but what if the arrangement were yes-no–no-yes–yes-no–no-yes? Order intrudes in ways that defy any straightforward counting algorithm.