Online Book Reader

Home Category

I'm Feeling Lucky_ The Confessions of Google Employee Number 59 - Douglas Edwards [181]

By Root 2064 0
of information but the ally and protector of users who were searching for answers. In addition to being "fast, accurate, and easy to use," we would be a trusted friend.

So when a parallel discussion started in January 2003 about Google's own privacy policy and what we revealed to users about the data we collected from their searches, something clicked in my head. "We need to cross the streams," I thought. It was all related. Scumware. Privacy. The Google toolbar. "Yada," I realized in a moment of transcendental clarity, "yada."

Our "Not the usual yada yada" message had forestalled an uproar over our toolbar tracking users as they moved across the web. Now we could shelter ourselves from a PR cataclysm over privacy and fight scumware at the same time by employing a similar tactic. I knew a firestorm was coming. We were not immune to criticism about our privacy policy—from mild concerns to wild conspiracy theories. We had people's most intimate thoughts in our log files and, soon enough, people would realize it. We didn't know who searched for what, but, as I had seen after 9/11, there were ways to extract that information if someone was motivated to do so.

Chances are you've Googled yourself. Almost all of us have searched for our own names. When you do that, Google sees your IP address, the number corresponding to your computer's connection point on the Internet. If you connect to the Internet via a large commercial Internet service provider (ISP), a new IP address is theoretically assigned each time you log on* and then reissued to others when you turn off your computer. In practice, however, your IP address may not change for days or even months.

If you've used Google before, most likely you also have a Google cookie on your computer—a unique string of digits Google placed there so it can remember your preferences each time you come back (preferences like "apply SafeSearch filtering" or "show results in Chinese"). Google doesn't know your name or your real-world location, though your IP address may reveal your city if your ISP assigns blocks of numbers to specific geographic regions.

Looking at all the searches conducted from one IP address by a computer with a cookie assigned to it over a period of time could give a search engine data about individual user behavior. That information would be invaluable in improving both the relevance of search results and the targeting of advertising.

Why is the information helpful? Say that for a single twenty-four- hour period you threw all the search terms entered by one cookie/IP address combo into a bucket and analyzed them to establish correlations. Then say you compared those correlations with those found in other buckets: other searches conducted by other cookied computers. Patterns would emerge. So if you found that a search for "best sushi in Mountain View" was often followed by a search for "Sushi Tomi restaurant," you might associate Sushi Tomi with the best sushi in Mountain View. A large search engine could compare tens of millions of buckets to determine how terms were related to one another. With that much data, you could derive some fairly definitive answers.

Using searchers' data, though, creates a fundamental dilemma. How do you protect user privacy while retaining the maximum value of the data for improving the search engine that collected it? Part of Google's answer was to anoint Nikhil Bhatla our "privacy czar." One of the first questions Nikhil raised was about identifying a user strictly from the stream of queries tied to one cookie over time. He shared an anecdote about engineer Jeff Dean, who had been working in the logs system where user search data was recorded. Jeff noticed that one cookie had been conducting a very interesting series of queries on technical topics, using highly sophisticated search techniques. He was impressed by the searcher's acumen. Only after studying the data further did he realize that the query stream he was looking at came from his own computer.

Nikhil's question kicked off a privacy debate among Googlers that dragged on for

Return Main Page Previous Page Next Page

®Online Book Reader