The Filter Bubble - Eli Pariser [11]
Books were ideal for a few reasons. For starters, the book industry was decentralized; the biggest publisher, Random House, controlled only 10 percent of the market. If one publisher wouldn’t sell to him, there would be plenty of others who would. And people wouldn’t need as much time to get comfortable with buying books online as they might with other products—a majority of book sales already happened outside of traditional bookstores, and unlike clothes, you didn’t need to try them on. But the main reason books seemed attractive was simply the fact that there were so many of them—3 million active titles in 1994, versus three hundred thousand active CDs. A physical bookstore would never be able to inventory all those books, but an online bookstore could.
When he reported this finding to his boss, the investor wasn’t interested. Books seemed like a kind of backward industry in an information age. But Bezos couldn’t get the idea out of his head. Without a physical limit on the number of books he could stock, he could provide hundreds of thousands more titles than industry giants like Borders or Barnes & Noble, and at the same time, he could create a more intimate and personal experience than the big chains.
Amazon’s goal, he decided, would be to enhance the process of discovery: a personalized store that would help readers find books and introduce books to readers. But how?
Bezos started thinking about machine learning. It was a tough problem, but a group of engineers and scientists had been attacking it at research institutions like MIT and the University of California at Berkeley since the 1950s. They called their field “cybernetics”—a word taken from Plato, who coined it to mean a self-regulating system, like a democracy. For the early cyberneticists, there was nothing more thrilling than building systems that tuned themselves, based on feedback. Over the following decades, they laid the mathematical and theoretical foundations that would guide much of Amazon’s growth.
In 1990, a team of researchers at the Xerox Palo Alto Research Center (PARC) applied cybernetic thinking to a new problem. PARC was known for coming up with ideas that were broadly adopted and commercialized by others—the graphical user interface and the mouse, to mention two. And like many cutting-edge technologists at the time, the PARC researchers were early power users of e-mail—they sent and received hundreds of them. E-mail was great, but the downside was quickly obvious. When it costs nothing to send a message to as many people as you like, you can quickly get buried in a flood of useless information.
To keep up with the flow, the PARC team started tinkering with a process they called collaborative filtering, which ran in a program called Tapestry. Tapestry tracked how people reacted to the mass e-mails they received—which items they opened, which ones they responded to, and which they deleted—and then used this information to help order the inbox. E-mails that people had engaged with a lot would move to the top of the list; e-mails that were frequently deleted or unopened would go to the bottom. In essence, collaborative filtering was a time saver: Instead of having to sift through the pile of e-mail yourself, you could rely on others to help presift the items you’d received.
And of course, you didn’t have to use it just for e-mail. Tapestry, its creators wrote, “is designed to handle any incoming stream of electronic documents. Electronic mail is only one example of such a stream: others are newswire stories and Net-News articles.”
Tapestry had introduced collaborative filtering to the world, but in 1990, the world wasn’t very interested. With only a few million users, the Internet was still a small ecosystem, and there just wasn’t much information to sort or much bandwidth to download with. So for years collaborative filtering remained the domain of software