Proofiness - Charles Seife [12]
In New York, scores on the state’s reading and math tests have risen sharply since 2005. Any politician who has anything to do with education policy in the state basks in the glow of the rising scores each year. In 2008, for example, New York City mayor Michael Bloomberg declared that the “dramatic upward trend” in state test scores showed that the city’s public schools were “in a different league” than when he took office. However, soon after the state tests were administered in 2005, teachers told reporters that the test was much easier than the year before. “What a difference from the 2004 test,” a principal of a Bronx school told the New York Times. “I was so happy for the kids—they felt good after they took the 2005 test.” Scores on the state tests climbed year after year, rising dramatically and improbably. On national tests, though, New York didn’t seem like such a success story. In New York City, for example, scores on the national tests stayed more or less unchanged. It’s pretty clear that New York State had been tinkering with the difficulty of the test. By making the test easier year after year, it artificially makes students’ test scores rise. It seems as if the children are performing better on the tests, but in fact the rise in scores is meaningless; the 2004 test score doesn’t mean the same thing as the 2006 test score, which doesn’t mean the same thing as the 2008 test score. By pretending that these tests are equivalent, New York is engaging in another form of fruit-packing: comparing apples to oranges.
This particular trick is a game of units. As mentioned earlier in this chapter, every real-world number has a unit attached to it—a little tag like “feet” or “seconds” or “kilograms” or “dollars” that tells you what kind of measurement the number is tied to. When you compare two numbers to see which one’s bigger than the other, it’s important to ensure that the units are the same, otherwise the comparison is meaningless. Sometimes this is obvious (which is longer, 55 seconds or 6.3 feet?), but sometimes it’s a little tricky to tell that the units aren’t quite the same. Which is better: a 50 percent score on test A or a 70 percent score on test B? It’s a meaningless comparison unless you have some way of converting a test A score into a test B score and vice versa.
There’s no value to making a direct comparison of test scores from year to year unless the test makers ensure that the value of a given score always stays the same, yet this is precisely what New York State did, exploiting this apples-and-oranges problem to make it look as though their educational system were improving. The effect is very similar to cherry-picking; when New York compared apples to oranges, they distorted the meaning of numbers, making the statistics appear to support an argument that they don’t.
Apple-orange comparisons can be really tough to spot, because units can be fluid creatures. Some of them change their meaning over time. The dollar, for example, is the unit that Americans use to measure money. But as a unit of wealth, it is always changing. Flip through an old magazine and look at the ads. The changing value of the dollar will hit you on the head. A December 1970 copy of Esquire that I happen to have in my office shows that a low-end two-door car cost $1,899. Name-brand carry-on luggage would set you back $17. A pair of mass-produced men’s shoes is worth $19. Right now (2010), the equivalent low-end two-door car costs about $12,000. The name-brand carry-on luggage will set you back $130. The pair of mass-produced men’s shoes is worth roughly $100. Even though all of these numbers seem to have