Online Book Reader

Home Category

Reinventing Discovery_ The New Era of Networked Science - Michael Nielsen [60]

By Root 359 0
in some detail an example from biology of how it works. The example shows how we can use clever algorithms and the scientific information commons to do something remarkable: find the genome of a human being. To understand the example, we first need to recall a little background about genetics. As you know, inside each of the cells in your body are many strands of the DNA molecule. Those strands of DNA carry information, and the information they carry is the design for you. To understand how DNA carries this information, recall the double helix structure of DNA. The helices are beautiful and memorable, but the information isn’t stored in the helices, per se. Rather, it is stored in between the helices. Every few nanometers as you move up the double helix there is a pair of molecules joining the two sides of the helix, called a base pair. It’s a pair of special little mini-molecules that bond to one another, and to the backbones of the double helix. There are four types of base molecule, called adenine, cytosine, guanine, and thymine. Their names are usually just shortened to A, C, G, and T. The A bonds to the T and the C to the G, so the possible pairs are A-T and C-G. You can thus describe the information in a single strand of DNA through a long sequence of letters—say, CGTCAAGG . . . — representing the bases bonded to one side of the helix (the other side will have complementary bases, GCAGTTCC . . .). That sequence is a description of your basic architecture. How exactly it specifies that architecture is still only partially understood, but everything we know suggests the sequence of DNA base pairs is the blueprint for our design.

How do we figure out the DNA sequence for a person? In fact, if we start with a fragment of DNA that is just a few hundred base pairs long, then it can be directly sequenced using clever old-school chemistry—essentially, one scientist, in their lab, carefully mixing chemicals. But if the DNA strand is much longer than that, then the problem of sequencing gets more complex. A typical strand of human DNA contains several hundred million base pairs, far too long to be sequenced directly. But there is a clever way of combining direct sequencing of short DNA strands with data-driven intelligence to figure out the full DNA sequence.

To understand how it works, imagine that I gave you a copy of the first Harry Potter book, Harry Potter and the Philosopher’s Stone. But instead of giving you an ordinary copy of the book, I’ve taken a pair of scissors and cut the book into tiny little fragments. For example, the opening of the book might be cut up into these fragments:

“Mr. and Mrs. Dursley, of number four, Priv”;

“et Drive, were proud to say that they we”;

“re perfectly normal, thank you very much.”


And so on. I’ve simplified things a bit here by showing the fragments in the same order they appear at the beginning of the book. But I want you to imagine that I’ve given them to you in the wrong order, all scrambled up. At the same time, imagine I have also given you a second copy of the book, also cut up into small fragments, but in a different way:

“Mr. and Mrs. Dursley, of num”;

“ber four, Privet Drive, were proud to”;

“say that they were perfectly normal, tha”.


Even though the fragments in the two cases are different, there’s quite a bit of overlap, and you can use those overlaps to figure out which fragments go together. Notice, for example, that the fragment “Mr. and Mrs. Dursley, of number four, Priv” overlaps with both “Mr. and Mrs. Dursley, of num” and “ber four, Privet Drive, were proud to.” This suggests pasting the last two fragments together, to get “Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to.” By continuing very carefully in this way, you could reconstruct quite long sequences from the book. You’d only get stuck if, by chance, the overlap between two fragments was so short that it made it hard to tell that they really were overlapping fragments of the same text. But if I gave you a third (and a fourth . . .) copy of the book randomly cut up in

Return Main Page Previous Page Next Page

®Online Book Reader