I'm Feeling Lucky_ The Confessions of Google Employee Number 59 - Douglas Edwards [92]
"That was a pretty tricky business," Gomes recalls, "because you had to copy onto machines that were currently serving the traffic and then at some point arrange for the flipover to take place. The new index was almost always larger and more complex, and you were never sure what would go wrong at a flip." Something almost always blew up. Gomes once started to explain the process to a colleague: "If things go well ..." Then he paused and asked, "Why the hell am I saying, 'If things go well'?"
Ben Smith owned the front-end infrastructure that enabled Google to serve the index to Yahoo. Smith and Craig Silverstein were the experts on the Google Web Server (GWS), the system that actually communicated directly with users. This put Smith in the role of riding herd over latency problems.
"It was the most miserable couple of months in my life," Smith said about the Yahoo buildup. "I'd be driving home with the sun coming up. I'd get four hours of sleep and then head back to work."
Every neck felt the hot breath of failure, and every throat tasted the bile waiting to erupt if they fell behind schedule. Though not all felt it with the same intensity. "I wasn't too worried about it," Urs told me. "What we promised Yahoo was a lot smaller than our goal in terms of coverage [the 1B index]. The scary things were the reliability parts, not the quality. They can't measure quality."
Urs knew that ultimately it was just a business deal and that Yahoo had the upper hand. "If Yahoo wanted to walk away," he conceded, "they could walk away. They didn't even need a pretext. It was a pretty one-sided contract."
Google would be taking a calculated risk by giving Yahoo guarantees, but Urs made that calculation and felt comfortable enough with the odds that he slept easy at night. "We promised ninety-nine-point-five percent uptime," he said, "and we weren't reeeeaaallllly quite there. So you look at the penalties and say, fine, if it occasionally happens, then we'll pay some of these penalties. Hopefully in a good partnership, people are going to be rewarding you for seriously trying. And we were definitely seriously trying."
The End of "The"
So what did all this effort produce?
"Mostly," Jeff said, "we wanted to get many more queries per second served out of these machines. One of the big things we did was completely change the index format to make it much more compact."
In layman's terms, Google's index was full of spaces that didn't need to be there—it fit the data like baggy pants in constant danger of hitting the ground. Google wasted precious time searching empty pockets to find the bits it needed. One of JeffnSanjay's innovations was to shove most occurrences of a particular word into a single block in the database. Kind of like putting all your nickels in one pocket, dimes in another, so if you see a nickel, you know not to waste time searching through that pocket for a dime. The software searching the index could tell quickly from the block header that it didn't need anything in that block and skip ahead, which made each machine faster.
"We improved that," Jeff said, "and we added skip tables to skip even larger chunks than just blocks." The goal was to minimize the number of times Google read each hard drive, because physically moving a head across a disk is far, far slower than doing things within an electronic circuit. JeffnSanjay rewrote the disk-scheduling systems to give each disk its own set of code. That cut search times by thirty to forty percent. A thirty-percent improvement was like running a four-minute mile in under three. A stunning accomplishment. But it wasn't enough.
So Jeff and Sanjay got rid of "the."
"The" is the most common word in English