Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Reinventing Discovery - Michael Nielsen [62]

By Root 366 0

the current state of the data web is messy and chaotic and incomplete. That’s okay: the early days of a new technology are often messy. Think of how messy and chaotic the early history of aviation was, in the 1890s and early 1900s, before the Wright brothers first flew. Dozens of people were pursuing their own ideas about the best way to build heavier-than-air flying machines. It was out of that mess of ideas that the first airplanes slowly emerged. In a similar way, today thousands of people and organizations have their own ideas about the best way to build the data web. All are aiming in roughly the same direction, but there are many differences in the details. Perhaps the best-known effort comes from academia, where many researchers are developing an approach called the semantic web. In the business world, the state of affairs is more fluid, as companies try out many different ways of sharing data. Because of these many approaches, there are passionate arguments about the best way to build the data web, often carried out with great conviction and certainty. But the data web is still in its infancy, and it’s too early to say which approach will succeed. For these reasons, Il use the term “data web” rather loosely to refer to all open data, taken together in aggregate. It’s a bit of an exaggeration, since much of that data isn’t properly linked up, or is hard to find online. But that linking is coming, and so I’ve taken some license.

If we don’t know what technology will ultimately be used to build the data web, how can we be sure the data web will grow and flourish? We can because a large and growing number of people want to share their data, and to link it up with other sources. We’ve seen a little of how this is happening in science. It’s also true of many businesses and governments, some of which are making at least some data open. The website Twitter, for example, makes some of its data openly available, and this has led to the creation of third-party services such as TwitPic, which makes it easy to share photos on Twitter, and Tweetdeck, which offers a streamlined way of using Twitter. As another example, the day after US President Barack Obama’s inauguration he issued a memorandum on “Transparency and Open Government.” This memorandum led to the creation of a website called data.gov, where the US government shares more than 1,200 open data sets on subjects ranging from energy use to aviation accidents to television licenses. Examples such as these are driving the development of technologies to share data across the greatest number of users. Whichever technology wins broad adoption will become, by default, the data web. That’s why we don’t need to know which technological vision of the data web will win to conclude that the data web is inevitable.

Perhaps the most impressive steps toward the data web to date have been taken in biology. Biologists are picking off chunks of the biological world and mapping them out, building toward a unified map of all of biology. We’ve discussed some of these chunks—the human genome, the haplotype map, and the just-beginning human connectome. But there are many more. There are online databases that describe the biological world at a very small level, for example mapping out protein structure and function, and the many possible interactions between proteins (the “interactome”). There are online databases describing the large-scale biological world, mapping out things such as animal migration patterns, and even catalogs that attempt to map out all the world’s species. And there are online databases at every level in between, a plethora of resources for the description of the biological world. Wikipedia’s list of biological databases has more than 100 entries as of April 2011. Those databases can potentially be linked up, to reflect the connections in biological systems: genetic information is linked to protein information, which is linked to information about protein interactions, which is linked to information about metabolism, and so on, all building toward a unified map of biology.

Online Book Reader

Reinventing Discovery - Michael Nielsen [62]

®Online Book Reader