Complexity_ A Guided Tour - Melanie Mitchell [26]
FIGURE 3.4. Top: Information content (zero) of Nicky’s conversation with Grandma. Bottom: Higher information content of Jake’s conversation with Grandma. (Drawings by David Moser.)
Shannon asked, “How much information is transmitted by a source sending messages to a receiver?” In analogy with Boltzmann’s ideas, Shannon defined the information of a macrostate (here, a source) as a function of the number of possible microstates (here, ensembles of possible messages) that could be sent by that source. When my son Nicky was barely a toddler, I would put him on the phone to talk with Grandma. He loved to talk on the phone, but could say only one word—“da.” His messages to Grandma were “da da da da da….” In other words, the Nicky-macrostate had only one possible microstate (sequences of “da”s), and although the macrostate was cute, the information content was, well, zero. Grandma knew just what to expect. My son Jake, two years older, also loved to talk on the phone but had a much bigger vocabulary and would tell Grandma all about his activities, projects, and adventures, constantly surprising her with his command of language. Clearly the information content of the Jake-source was much higher, since so many microstates—i.e., more different collections of messages—could be produced.
Shannon’s definition of information content was nearly identical to Boltzmann’s more general definition of entropy. In his classic 1948 paper, Shannon defined the information content in terms of the entropy of the message source. (This notion of entropy is often called Shannon entropy to distinguish it from the related definition of entropy given by Boltzmann.)
People have sometimes characterized Shannon’s definition of information content as the “average amount of surprise” a receiver experiences on receiving a message, in which “surprise” means something like the “degree of uncertainty” the receiver had about what the source would send next. Grandma is clearly more surprised at each word Jake says than at each word Nicky says, since she already knows exactly what Nicky will say next but can’t as easily predict what Jake will say next. Thus each word Jake says gives her a higher average “information content” than each word Nicky says.
In general, in Shannon’s theory, a message can be any unit of communication, be it a letter, a word, a sentence, or even a single bit (a zero or a one). Once again, the entropy (and thus information content) of a source is defined in terms of message probabilities and is not concerned with the “meaning” of a message.
Shannon’s results set the stage for applications in many different fields. The best-known applications are in the field of coding theory, which deals with both data compression and the way codes need to be structured to be reliably transmitted. Coding theory affects nearly all of our electronic communications; cell phones, computer networks, and the worldwide global positioning system are a few examples.
Information theory is also central in cryptography and in the relatively new field of bioinformatics, in which entropy and other information theory measures are used to analyze patterns in gene sequences. It has also been applied to analysis of language and music and in psychology, statistical inference, and artificial intelligence, among many other fields. Although information theory was inspired by notions of entropy in thermodynamics and statistical mechanics, it is controversial whether or not information theory has had much of a reverse impact on those and other fields of physics. In 1961, communications engineer and writer John Pierce quipped that “efforts to marry communication