Final Jeopardy (Alexandra Cooper Mysteries) - Linda Fairstein [42]
By early 2008, Blue J’s scores were rising. On the Jennings Arc posted on the wall of the War Room, it was climbing toward the champion—but was still 30 percent behind him. If it continued the pace of the last six months, it might reach Jennings by mid-2008 or even earlier. But that wasn’t the way things worked. Early on, Ferrucci said, the team had taught Blue J the easy lessons. “In those first months, we could put in a new algorithm and see its performance jump by two or three percent,” he said. But with the easy fixes in, the advances would be smaller, measured in tenths of a percentage.
The answer was to focus on Blue J’s mistakes. Each one pointed to a gap in its knowledge or a misunderstanding: something to fix. In that sense, each mistake represented an opportunity. The IBM team, working in 2007 with Eric Nyberg, a computer scientist at Carnegie Mellon, had designed Blue J’s architecture for what they called blame detection. The machine monitored each stage of its long and intricate problem-solving process. Every action generated data, lots of it. Analysts could carry out detailed studies of the pathways and performance of algorithms on each question. They could review each document the computer consulted and the conclusions it drew from it. In short, the team could zero in on each decision that led to a mistake and use that information to improve Blue J’s performance.
The researchers were swimming in examples of misunderstandings and wrong turns. Blue J, after all, was failing on half of the clues. But which ones represented larger patterns? Fixing those might enhance its analysis in an entire category. One South America clue, for example, appeared to signal a glitch on Blue J’s part in analyzing geography—an important category in Jeopardy. The clue asked for the country that shared the longest border with Chile. Blue J came back with the wrong answer: What is Bolivia? The correct response (What is Argentina?) was its second choice.
Analyzing the clue, researchers saw that Blue J had received conflicting answers from two algorithms. The one specializing in geography had come back with the right answer, Argentina, whose 5,308-kilometer border with Chile dwarfed the 861-kilometer Chilean-Bolivian frontier. But another algorithm had counted references to these countries and their borders and found a lot more talk about the Bolivian stretch. (Chile and Bolivia have been engaged in a border dispute since the 1870s, generating a steady stream of news coverage.) Lacking any other context, this single-minded algorithm suggested Bolivia—and Blue J unwisely trusted it. “The computer was paying more attention to popularity than geography,” Ferrucci said. Researchers went on to tinker with the ratios underlying Blue J’s judgment. They instructed it to give more weight to the geography in that type of question and a bit less to popularity. Then they tested the system on a large batch of similar geography clues. Blue J’s performance improved. They then ran it on a group of random clues to find out if the adjustment affected Blue J’s performance elsewhere, perhaps turning correct answers into mistakes. That happened all too often. But this time the change helped. Blue J’s performance inched ahead another tiny fraction of a percent.
The Jeopardy clues, nearly all of them from the J! Archive Web site, were the test bed for this stage of Blue J’s education. Eric Brown, Ferrucci’s top lieutenant, oversaw this cache along with Chu-Carroll. Brown was