The Filter Bubble - Eli Pariser [79]
To get a sense of how this works, consider Google Translate, which can now do a passable job translating automatically among nearly sixty languages. You might imagine that Translate was built with a really big, really sophisticated set of translating dictionaries, but you’d be wrong. Instead, Google’s engineers took a probabilistic approach: They built software that could identify which words tended to appear in connection with which, and then sought out large chunks of data that were available in multiple languages to train the software on. One of the largest chunks was patent and trademark filings, which are useful because they all say the same thing, they’re in the public domain, and they have to be filed globally in scores of different languages. Set loose on a hundred thousand patent applications in English and French, Translate could determine that when word showed up in the English document, mot was likely to show up in the corresponding French paper. And as users correct Translate’s work over time, it gets better and better.
What Translate is doing with foreign languages Google aims to do with just about everything. Cofounder Sergey Brin has expressed his interest in plumbing genetic data. Google Voice captures millions of minutes of human speech, which engineers are hoping they can use to build the next generation of speech recognition software. Google Research has captured most of the scholarly articles in the world. And of course, Google’s search users pour billions of queries into the machine every day, which provide another rich vein of cultural information. If you had a secret plan to vacuum up an entire civilization’s data and use it to build artificial intelligence, you couldn’t do a whole lot better.
As Google’s protobrain increases in sophistication, it’ll open up remarkable new possibilities. Researchers in Indonesia can benefit from the latest papers in Stanford (and vice versa) without waiting for translation delays. In a matter of a few years, it may be possible to have an automatically translated voice conversation with someone speaking a different language, opening up whole new channels of cross-cultural communication and understanding.
But as these systems become increasingly “intelligent,” they also become harder to control and understand. It’s not quite right to say they take on a life of their own—ultimately, they’re still just code. But they reach a level of complexity at which even their programmers can’t fully explain any given output.
This is already true to a degree with Google’s search algorithm. Even to its engineers, the workings of the algorithm are somewhat mysterious. “If they opened up the mechanics,” says search expert Danny Sullivan, “you still wouldn’t understand it. Google could tell you all two hundred signals it uses and what the code is and you wouldn’t know what to do with them.” The core software engine of Google search is hundreds of thousands of lines of code. According to one Google employee I talked to who had spoken to the search team, “The team tweaks and tunes, they don’t really know what works or why it works, they just look at the result.”
Google promises that it doesn’t tilt the deck in favor of its own products. But the more complex and “intelligent” the system gets, the harder it’ll be to tell. Pinpointing where bias or error exists in a human brain is difficult or impossible—there are just too many neurons and connections to narrow it down to a single malfunctioning chunk of tissue. And as we rely on intelligent systems like Google’s more, their opacity could cause real problems—like the still-mysterious machine-driven “flash crash” that caused the Dow to drop 600 points in a few minutes on May 6, 2010.
In a provocative article in Wired, editor-in-chief Chris Anderson argued that huge