I'm Feeling Lucky_ The Confessions of Google Employee Number 59 - Douglas Edwards [93]
To keep the failure of a single machine from corrupting the data and requiring a restart of the entire crawl, the war room team implemented checkpointing, which saved the state of the crawl so that if things blew up they could go back to the last checkpoint instead of starting over from the beginning.
With the hardware on its way to the data centers and the crawler, the indexer, the ranker, and the serving side progressing, only one issue remained. Yahoo wanted its search results to appear current, so it insisted that at least part of its index be updated on a daily basis.
Think of a card shark at a blackjack table. She carefully arranges the cards to ensure that everyone gets a good hand, but not as good as hers. She starts dealing around the table. Now imagine her trying to add new cards to the deck in her hand as she deals, improving all the results, including her own. It was that kind of problem.
Google's PageRank algorithm required a full day and a half to score an index. Adding additional information every twenty-four hours meant the pageranker would have to run faster, while integrating the new data in all the appropriate places. "It is a much harder problem to update an index every day than it is to have a static index," Jeff explained. "There are many more moving pieces to deal with."
Jeff was maxed out. Sanjay was overloaded. Ben Gomes had a full plate. Developing an incremental indexing system could take a dedicated team of programmers years, and there were only weeks before the contract went into effect. Larry and Sergey, understanding the desperate need, threw the resource floodgates open and gave Urs carte blanche to do what was needed. Never one to waste an opportunity, Urs went all out. He hired a guy.
"I had no experience with crawls," Anurag Acharya recalls, "and Google didn't tell people what they would be working on." Urs had sung his siren song at perfect pitch and persuaded his former UC Santa Barbara colleague to abandon academia for Silicon Valley.
On his first day, Anurag focused on part of the indexing system. That same evening, Urs stopped by for a chat about his next assignment.
"I'll take a look at the logs," Anurag suggested, "and see what problems there might be."
"Why don't you do incremental indexing for a while," Urs casually replied, "and then we'll see?"
"I say 'Yeah,'" Anurag told me about that conversation, "like I know what doing incremental indexing really means. So there went the next five months."
Google didn't haze newbies, but Anurag must have felt as if he'd been led blindfolded into a room full of drunken frat boys with wooden paddles. He was hit with the complex issues of how to crawl additional sites, rank them appropriately, and then integrate them seamlessly into the existing index.
"I don't think I was brought in specifically for the index," he said. "It just happened. I showed up at that point, and at that point, those were the problems."
"Anurag started and a couple of us in the company knew him," said Ben Smith, who had been Anurag's student at UCSB, "and he basically just disappeared. He wouldn't come down to lunch. He was always in his office. He was there late for two months. What is up with this guy? And then Urs called me into his office and said, 'This is what's coming. Soon. Can you help him out?' Okay. Now I understood why."
Smith knew exactly what he was getting into. The first time Urs had asked him to take on incremental indexing had been almost a year earlier, on his first day as an intern at Google. Smith had refused. "I said," he told me with a laugh, "'That's way too big for a summer project, nobody really knows how to do that. I don't wanna tackle that.'" Now he and Anurag would have to figure it out in a matter of weeks.
Smith had already sped