Reinventing Discovery - Michael Nielsen [56]
There is another factor inhibiting open scientific data, which is that even if you are willing to share your data, it can be difficult to do so in a way that’s useful to others. You can take all the photographs of galaxies you like, and share them with others, but those photographs are of limited scientific use without all sorts of extra information. What color filters did you use? Has the image been processed in any way, say, to remove bad or damaged pixels? Was there any haze the nhe photos were taken, which might obscure the image? And so on. In many parts of science it’s difficult to make sense of experimental data without detailed calibration information. And even with the data and the calibration information, other scientists still need an extremely detailed understanding of the experiment to make use of the data. Add on top of that problems like being sure everyone is using technical terminology in exactly the same way, file format conversion, and so on. Individually these are all soluble problems, but together they’re a formidable obstacle to sharing data in a way that’s useful.
These questions about sharing data are part of a deeper story, a story about why and when scientific knowledge is shared. Earlier in the book, I mentioned several times that scientists build their reputation and career based on the papers they’ve written. A reputation for writing great papers will get them a good scientific job, and continued grant support. Much of the challenge with data sharing is that the rewards scientists get for sharing their data are much more uncertain than the rewards for writing papers. It’s true that a few large collaborations such as the SDSS have won widespread kudos for sharing data. But in many areas of science, there are few established norms for how and when the use of someone else’s data should be acknowledged. And that means that sharing data is chancy for a scientist. It’s just not something scientists are typically well rewarded for, despite the fact that it’s enormously valuable. And so open data remains uncommon, especially in smaller laboratories. We will return to the question of how to get scientists enthused about sharing data (and other related questions) in chapters 8 and 9. For the purposes of the remainder of this chapter it’s enough that there is already a considerable (and increasing) amount of scientific data openly available, through projects such as the SDSS and the Human Genome Project.
Dreaming of the Data Web
So far in this chapter we’ve taken a concrete, near-term perspective, looking at existing projects such as the SDSS. But the internet is an infinitely flexible and extensible platform for manipulating human knowledge, with a potential that is open-ended. To understand that potential we need to expand our thinking, and move to a long view that sees the internet not as a ten- or twenty-year revolution, but as a hundred-