The Information - James Gleick [110]
Shannon demonstrated one way to do this, an algorithm that exploits differing probabilities of different symbols. And he delivered a stunning package of fundamental results. One was a formula for channel capacity, the absolute speed limit of any communication channel (now known simply as the Shannon limit). Another was the discovery that, within that limit, it must always be possible to devise schemes of error correction that will overcome any level of noise. The sender may have to devote more and more bits to correcting errors, making transmission slower and slower, but the message will ultimately get through. Shannon did not show how to design such schemes; he only proved that it was possible, thereby inspiring a future branch of computer science. “To make the chance of error as small as you wish? Nobody had thought of that,” his colleague Robert Fano recalled years later. “How he got that insight, how he came to believe such a thing, I don’t know. But almost all modern communication theory is based on that work.”♦ Whether removing redundancy to increase efficiency or adding redundancy to enable error correction, the encoding depends on knowledge of the language’s statistical structure to do the encoding. Information cannot be separated from probabilities. A bit, fundamentally, is always a coin toss.
If the two sides of a coin were one way of representing a bit, Shannon offered a more practical hardware example as well:
A device with two stable positions, such as a relay or a flip-flop circuit, can store one bit of information. N such devices can store N bits, since the total number of possible states is 2N and log22N = N.
Shannon had seen devices—arrays of relays, for example—that could store hundreds, even thousands of bits. That seemed like a great many. As he was finishing his write-up, he wandered one day into the office of a Bell Labs colleague, William Shockley, an Englishman in his thirties. Shockley belonged to a group of solid-state physicists working on alternatives to vacuum tubes for electronics, and sitting on his desk was a tiny prototype, a piece of semiconducting crystal. “It’s a solid-state amplifier,” Shockley told Shannon.♦ At that point it still needed a name.
One day in the summer of 1949, before the book version of The Mathematical Theory of Communication appeared, Shannon took a pencil and a piece of notebook paper, drew a line from top to bottom, and wrote the powers of ten from 100 to 1013. He labeled this axis “bits storage capacity.”♦ He began listing some items that might be said to “store” information. A digit wheel, of the kind used in a desktop adding machine—ten decimal digits—represents just over 3 bits. At just under 103 bits, he wrote “punched card (all config. allowed).” At 104 he put “page single spaced typing (32 possible symbols).” Near 105 he wrote something offbeat: “genetic constitution of man.” There was no real precedent for this in current scientific thinking. James D. Watson was a twenty-one-year-old student of zoology in Indiana; the discovery of the structure of DNA lay several years in the future. This was the first time anyone suggested the genome was an information store measurable in bits. Shannon’s guess was conservative, by at least four orders of magnitude. He thought a “phono record (128 levels)” held more information: about 300,000 bits. To the 10 million level he assigned a thick professional journal (Proceedings of the Institute of Radio Engineers) and to 1 billion the Encyclopaedia Britannica. He estimated one hour of broadcast television at 1011 bits and one hour of “technicolor movie” at more than a trillion. Finally, just under his pencil mark for 1014, 100 trillion bits, he put the largest information stockpile he could think of: the Library of Congress.
(Illustration credit 7.5)
* * *
♦ Toward the end of his life Gödel wrote, “It was only by Turing’s work that it became completely clear, that my proof is applicable to every formal system containing arithmetic.”
♦ “not considering statistical structure over greater distances