Drunkard's Walk - Leonard Mlodinow [67]
IT IS ONE OF THOSE CONTRADICTIONS of life that although measurement always carries uncertainty, the uncertainty in measurement is rarely discussed when measurements are quoted. If a fastidious traffic cop tells the judge her radar gun clocked you going thirty-nine in a thirty-five-mile-per-hour zone, the ticket will usually stick despite the fact that readings from radar guns often vary by several miles per hour.5 And though many students (along with their parents) would jump off the roof if doing so would raise their 598 on the math SAT to a 625, few educators talk about the studies showing that, if you want to gain 30 points, there’s a good chance you can do it simply by taking the test a couple more times.6 Sometimes meaningless distinctions even make the news. One recent August the Bureau of Labor Statistics reported that the unemployment rate stood at 4.7 percent. In July the bureau had reported the rate at 4.8 percent. The change prompted headlines like this one in The New York Times: “Jobs and Wages Increased Modestly Last Month.”7 But as Gene Epstein, the economics editor of Barron’s, put it, “Merely because the number has changed it doesn’t necessarily mean that a thing itself has changed. For example, any time the unemployment rate moves by a tenth of a percentage point…that is a change that is so small, there is no way to tell whether there really was a change.”8 In other words, if the Bureau of Labor Statistics measures the unemployment rate in August and then repeats its measurement an hour later, by random error alone there is a good chance that the second measurement will differ from the first by at least a tenth of a percentage point. Would The New York Times then run the headline “Jobs and Wages Increased Modestly at 2 P.M.”?
The uncertainty in measurement is even more problematic when the quantity being measured is subjective, like Alexei’s English-class essay. For instance, a group of researchers at Clarion University of Pennsylvania collected 120 term papers and treated them with a degree of scrutiny you can be certain your own child’s work will never receive: each term paper was scored independently by eight faculty members. The resulting grades, on a scale from A to F, sometimes varied by two or more grades. On average they differed by nearly one grade.9 Since a student’s future often depends on such judgments, the imprecision is unfortunate. Yet it is understandable given that, in their approach and philosophy, the professors in any given college department often run the gamut from Karl Marx to Groucho Marx. But what if we control for that—that is, if the graders are given, and instructed to follow, certain fixed grading criteria? A researcher at Iowa State University presented about 100 students’ essays to a group of doctoral students in rhetoric and professional communication whom he had trained extensively according to such criteria.10 Two independent assessors graded each essay on a scale of 1 to 4. When the scores were compared, the assessors agreed in only about half the cases. Similar results were found by the University of Texas in an analysis of its scores on college-entrance essays.11 Even the venerable College Board expects only that, when assessed by two raters, “92% of all scored essays will receive ratings within ± 1 point of each other on the 6-point SAT essay scale.”12
Another subjective measurement that is given more credence than it warrants is the rating of wines. Back in the 1970s the wine business was a sleepy enterprise, growing, but mainly in the sales of low-grade jug wines. Then, in 1978, an event often credited with the rapid growth of that industry occurred: a lawyer turned self-proclaimed wine critic, Robert M. Parker Jr., decided that, in addition to his reviews, he would rate wines numerically on a 100-point scale. Over the years most other wine publications followed suit. Today annual wine sales in the United States exceed $20 billion, and millions of wine aficionados won’t lay their money on the counter without