Reinventing Discovery_ The New Era of Networked Science - Michael Nielsen [55]
Taken together, these and other similar projects are mapping out our world in incredible, unprecedented detail. Of course, similar survey projects have been undertaken through the whole history of science, from the Almagest to the great botanists of the eighteenth and nineteenth centuries. But what’s going on today is special and unprecedented. The internet has dramatically expanded our ability to share and extract meaning from the models we are building. This has caused a corresponding increase in their scientific impact, as the SDSS vividly illustrates. The result is an explosion in the number and ambition of these efforts, bringing about a great age of discovery, much like the age of the explorers of the fifteenth to eighteenth centuries. But whereas those explorers went to the limits of the Earth’s geography, the new discoverers are exploring and mapping out the boundaries of our scientific world.
As more data is shared online, the traditional relationship between making observations and analyzing data is changing. Historically, observation and analysis have been yoked together: the person who did the experiment was also the person who analyzed the data. But today it’s becoming more and more common for the most valuable analyses to be done by people outside the original laboratory. In some parts of science the division of labor is changing, with some people specializing in building the experimental apparatus and collecting data, while others specialize in analyzing the data from those experiments. In biology, for example, a new breed of biologist has emerged, the bioinformatician, whose chief skill isn’t growing cell cultures or the other traditional skills of the biology lab, but who rather combines the skills of computer programmer and biologist to analyze existing biological data. In a similar way, chemistry has seen the emergence of cheminformatics, and astronomy the emergence of astroinformatics. These are disciplines where the main emphasis isn’t on doing new experiments, but rather on finding new meaning in existing data.
Why is Data Being Made Open?
Let’s return to the puzzle of why and when scientists make their data openly available. A clue comes from the size of the experiments. The SDSS, the Ocean Observatories Initiative, and the Allen Brain Atlas all cost (or will cost) tens or hundreds of millions of dollars, and involve hundreds or thousands of people. Our earlier examples of open data, such as the Human Genome Project and the haplotype map, were also enormous projects. But most scientific experiments are far smaller. And in the smaller experiments, open data is the exception, not the rule. Before I became interested in open data, I worked for 13 years as a physicist. In that time, I saw hundreds of experiments, nearly all of them small experiments done in modest laboratories. So far as I know, not a single one of those experiments made any systematic effort to make their data open. We saw something similar in the opening chapter, in the early reluctance of scientists to share genetic data in online databases such as GenBank. This has only changed because of major cooperative efforts such as the Bermuda Agreement on sharing human genetic data. Across science, the situation today is changing, with some scientific journals and grant agencies enacting policies that encourage or mandate that data be made openly available after experiments have been published. But open data remains the exception, not the rule. If you head out to your local university and walk into a small laboratory, you’ll most likely find that the data is kept under lock and key, sometimes literally.
It seems, then, that big scientific projects are more likely to make their data open than small projects. Why is that the case? Part of