Complexity_ A Guided Tour - Melanie Mitchell [47]
The physicist Seth Lloyd published a paper in 2001 proposing three different dimensions along which to measure the complexity of an object or process:
How hard is it to describe?
How hard is it to create?
What is its degree of organization?
Lloyd then listed about forty measures of complexity that had been proposed by different people, each of which addressed one or more of these three questions using concepts from dynamical systems, thermodynamics, information theory, and computation. Now that we have covered the background for these concepts, I can sketch some of these proposed definitions.
To illustrate these definitions, let’s use the example of comparing the complexity of the human genome with the yeast genome. The human genome contains approximately three billion base pairs (i.e., pairs of nucleotides). It has been estimated that humans have about 25,000 genes—that is, regions that code for proteins. Surprisingly, only about 2% of base pairs are actually parts of genes; the nongene parts of the genome are called noncoding regions. The noncoding regions have several functions: some of them help keep their chromosomes from falling apart; some help control the workings of actual genes; some may just be “junk” that doesn’t really serve any purpose, or has some function yet to be discovered.
I’m sure you’ve heard of the Human Genome project, but you may not know that there was also a Yeast Genome Project, in which the complete DNA sequences of several varieties of yeast were determined. The first variety that was sequenced turned out to have approximately twelve million base pairs and six thousand genes.
Complexity as Size
One simple measure of complexity is size. By this measure, humans are about 250 times as complex as yeast if we compare the number of base pairs, but only about four times as complex if we count genes.
Since 250 is a pretty big number, you may now be feeling rather complex, at least as compared with yeast. However, disappointingly, it turns out that the amoeba, another type of single-celled microorganism, has about 225 times as many base pairs as humans do, and a mustard plant called Arabidopsis has about the same number of genes that we do.
Humans are obviously more complex than amoebae or mustard plants, or at least I would like to think so. This means that genome size is not a very good measure of complexity; our complexity must come from something deeper than our absolute number of base pairs or genes (See figure 7.1).
Complexity as Entropy
Another proposed measure of the complexity of an object is simply its Shannon entropy, defined in chapter 3 to be the average information content or “amount of surprise” a message source has for a receiver. In our example, we could define a message to be one of the symbols A, C, G, or T. A highly ordered and very easy-to-describe sequence such as “A A A A A A A… A” has entropy equal to zero. A completely random sequence