Online Book Reader

Home Category

The Information - James Gleick [166]

By Root 911 0
about complexity.

The following bitstream (or number) is not very complex, because it is rational:

D: 14285714285714285714285714285714285714285714285714…

It may be rephrased concisely as “PRINT 142857 AND REPEAT,” or even more concisely as “1/7.” If it is a message, the compression saves keystrokes. If it is an incoming stream of data, the observer may recognize a pattern, grow more and more confident, and settle on one-seventh as a theory for the data.

In contrast, this sequence contains a late surprise:

E: 10101010101010101010101010101010101010101010101013

The telegraph operator (or theorist, or compression algorithm) must pay attention to the whole message. Nonetheless, the extra information is minimal; the message can still be compressed, wherever pattern exists. We may say it contains a redundant part and an arbitrary part.

It was Shannon who first showed that anything nonrandom in a message allows compression:

F: 101101011110110110101110101110111101001110110100111101110

Heavy on ones, light on zeroes, this might be emitted by the flip of a biased coin. Huffman coding and other such algorithms exploit statistical regularities to compress the data. Photographs are compressible because of their subjects’ natural structure: light pixels and dark pixels come in clusters; statistically, nearby pixels are likely to be similar; distant pixels are not. Video is even more compressible, because the differences between one frame and the next are relatively slight, except when the subject is in fast and turbulent motion. Natural language is compressible because of redundancies and regularities of the kind Shannon analyzed. Only a wholly random sequence remains incompressible: nothing but one surprise after another.

Random sequences are “normal”—a term of art meaning that on average, in the long run, each digit appears exactly as often as the others, one time in ten; and each pair of digits, from 00 to 99, appears one time in a hundred; and each triplet likewise, and so on. No string of any particular length is more likely to appear than any other string of that length. Normality is one of those simple-seeming ideas that, when mathematicians look closely, turn out to be covered with thorns. Even though a truly random sequence must be normal, the reverse is not necessarily the case. A number can be statistically normal yet not random at all. David Champernowne, a young Cambridge friend of Turing’s, invented (or discovered) such a creature in 1933—a construction made of all the integers, chained together in order:

G: 12345678910111213141516171819202122232425262728293…

It is easy to see that each digit, and each combination of digits, occurs equally often in the long run. Yet the sequence could not be less random. It is rigidly structured and completely predictable. If you know where you are, you know what comes next.

Even apart from freaks like Champernowne’s, it turns out that normal numbers are difficult to recognize. In the universe of numbers, normality is the rule; mathematicians know for sure that almost all numbers are normal. The rational numbers are not normal, and there are infinitely many rational numbers, but they are infinitely outnumbered by normal numbers. Yet, having settled the great and general question, mathematicians can almost never prove that any particular number is normal. This in itself is one of the more remarkable oddities of mathematics.

Even Π retains some mysteries:

C: 3.1415926535897932384626433832795028841971693993751…

The world’s computers have spent many cycles analyzing the first trillion or so known decimal digits of this cosmic message, and as far as anyone can tell, they appear normal. No statistical features have been discovered—no biases or correlations, local or remote. It is a quintessentially nonrandom number that seems to behave randomly. Given the nth digit, there is no shortcut for guessing the nth plus one. Once again, the next bit is always a surprise.

How much information, then, is represented by this string of digits? Is it information rich, like

Return Main Page Previous Page Next Page

®Online Book Reader