Final Jeopardy (Alexandra Cooper Mysteries) - Linda Fairstein [36]
For now, Chu-Carroll found herself contemplating academic heresy. Like college students paging through Cliff’s Notes or surfing Wikipedia, she began to wonder whether Blue J should bother with books at all. Each one contained so many passages that could be misconstrued. In the lingo of her colleagues, books had a sky-high signal-to-noise ratio. The signals, the potential answers, swam in oceans of words, so-called noise.
Imagine Blue J reading Mark Twain’s Huckleberry Finn. In one section, Huck and the escaped slave, Jim, are contemplating the night sky:
We had the sky up there, all speckled with stars, and we used to lay on our backs and look up at them, and discuss about whether they was made or only just happened. Jim he allowed they was made, but I allowed they happened; I judged it would have took too long to MAKE so many. Jim said the moon could a LAID them; well, that looked kind of reasonable, so I didn’t say nothing against it, because I’ve seen a frog lay most as many, so of course it could be done.
Assuming that Blue J could slog through the idiomatic language—no easy job for a computer—it could “learn” something about the cosmos. Both characters, it appeared, agreed that the moon, like a frog, could have laid the stars. It seemed “reasonable” to them, a conclusion Blue J would be likely to respect. A human would put that passage into context, learn something about Jim and Huck, and perhaps laugh. Blue J, it was safe to say, would never laugh. It would likely take note of an utterly fallacious parent-offspring relationship between the moon and the stars and record it. No doubt its mad hunt through hundreds of sources to answer a single Jeopardy clue would bring in much more astronomical data and statistically overwhelm this passage. In time, maybe the machine would develop trusted sources for such astronomical questions and wouldn’t be so foolish as to consult Huck Finn and Jim about the cosmos. But still, most books had too many words—too much noise—for the job ahead.
This led to an early conclusion about a Jeopardy machine. It didn’t need to know books, plays, symphonies, or TV sitcoms in great depth. It only needed to know about them. Unlike literature students, the machine would not be pressed to compare and contrast the themes of family or fate in Hamlet with those in Oedipus Rex. It just had to know they were there. When it came to art, it wouldn’t be evaluating the brushwork of Velázquez and Manet. It only needed to know some basic biographical facts about them, along with a handful of their most famous paintings. Ken Jennings, Ferrucci’s team learned, didn’t prepare for Jeopardy by plowing through big books. In Brainiac, he described endless practice with flash cards. The conclusion was clear: The IBM team didn’t need a genius. They had to build the world’s most impressive dilettante.
From their statistical analysis of twenty thousand Jeopardy clues drawn randomly from the past twenty years, Chu-Carroll and her colleagues knew how often each category, from U.S. presidents to geography, was likely to pop up. Cities and countries each accounted for a bit more than 2 percent of the clues; Shakespeare and Abraham Lincoln were regulars on the big board. The team proceeded to load Blue J with the data most likely to contain the answers. It was a diet full of lists, encyclopedia entries, dictionaries, thesauruses, newswire articles, and downloaded Web pages. Then they tried out batches of Jeopardy clues to see how