Online Book Reader

Home Category

The Filter Bubble - Eli Pariser [13]

By Root 828 0
of the Web was a lot more data than most search engines made use of. The fact that a Web page linked to another page could be considered a “vote” for that page. At Stanford, Page had seen professors count how many times their papers had been cited as a rough index of how important they were. Like academic papers, he realized, the pages that a lot of other pages cite—say, the front page of Yahoo—could be assumed to be more “important,” and the pages that those pages voted for would matter more. The process, Page argued, “utilized the uniquely democratic structure of the web.”

In those early days, Google lived at google.stanford.edu, and Brin and Page were convinced it should be nonprofit and advertising free. “We expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers,” they wrote. “The better the search engine is, the fewer advertisements will be needed for the consumer to find what they want.... We believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.”

But when they released the beta site into the wild, the traffic chart went vertical. Google worked—out of the box, it was the best search site on the Internet. Soon, the temptation to spin it off as a business was too great for the twenty-something cofounders to bear.

In the Google mythology, it is PageRank that drove the company to worldwide dominance. I suspect the company likes it that way—it’s a simple, clear story that hangs the search giant’s success on a single ingenious breakthrough by one of its founders. But from the beginning, PageRank was just a small part of the Google project. What Brin and Page had really figured out was this: The key to relevance, the solution to sorting through the mass of data on the Web was ... more data.

It wasn’t just which pages linked to which that Brin and Page were interested in. The position of a link on the page, the size of the link, the age of the page—all of these factors mattered. Over the years, Google has come to call these clues embedded in the data signals.

From the beginning, Page and Brin realized that some of the most important signals would come from the search engine’s users. If someone searches for “Larry Page,” say, and clicks on the second link, that’s another kind of vote: It suggests that the second link is more relevant to that searcher than the first one. They called this a click signal. “Some of the most interesting research,” Page and Brin wrote, “will involve leveraging the vast amount of usage data that is available from modern web systems.... It is very difficult to get this data, mainly because it is considered commercially valuable.” Soon they’d be sitting on one of the world’s largest stores of it.

Where data was concerned, Google was voracious. Brin and Page were determined to keep everything: every Web page the search engine had ever landed on, every click every user ever made. Soon its servers contained a nearly real-time copy of most of the Web. By sifting through this data, they were certain they’d find more clues, more signals, that could be used to tweak results. The search-quality division at the company acquired a black-ops kind of feel: few visitors and absolute secrecy were the rule.

“The ultimate search engine,” Page was fond of saying, “would understand exactly what you mean and give back exactly what you want.” Google didn’t want to return thousands of pages of links—it wanted to return one, the one you wanted. But the perfect answer for one person isn’t perfect for another. When I search for “panthers,” what I probably mean are the large wild cats, whereas a football fan searching for the phrase probably means the Carolina team. To provide perfect relevance, you’d need to know what each of us was interested in. You’d need to know that I’m pretty clueless about football; you’d need to know who I was.

The challenge was getting enough data to figure out what’s personally relevant to each user.

Return Main Page Previous Page Next Page

®Online Book Reader