Proofiness - Charles Seife [41]
The typical modern poll of 1,000 people has a margin of error of about 3.1 percent. Increase the sample size, and the margin of error shrinks. A poll of 4,000 people has a margin of error of about 1.6 percent; in a sample of 16,000 the margin of error drops again to about 0.78 percent.
Week by week, the results trickled in as people returned their ballot cards; each week the Literary Digest clucked and strutted, blaring the accuracy of its polls and the importance of its huge sample. Right out of the starting gate, the magazine had more than 24,000 responses. This would correspond to a margin of error of roughly 0.6 percent—way, way lower than what even most modern polls can claim. By the week before the election, the sample had risen to an unbelievable 2,376,523 ballots, all tabulated by hand. This would correspond to a margin of error of six-hundredths of a percent: 0.06 percent. Though the editors of the Literary Digest didn’t express it in these terms—the term “margin of error” hadn’t yet come into fashion—the tremendous sample size gave them enormous confidence in the poll’s results. The week before the election, they couldn’t resist bragging:
The Poll represents the most extensive straw ballot in the field—the most experienced in view of its twenty-five years of perfecting—the most unbiased in view of its prestige—a Poll that has always previously been correct.
That perfect record was about to become roadkill: the poll predicted that Alf Landon would beat Franklin Roosevelt by a large margin. The election, of course, was a landslide in the opposite direction.
The Digest predicted that Landon would win about 54 percent of the popular vote, and the margin of error was a mere 0.06 percent. Instead, Landon only got 37 percent. It was a huge mistake, more than 250 times larger than the margin of error would seem to allow. At one blow, the Literary Digest’s reputation for accuracy lay bleeding at the side of the road; the large-samples-equals-accurate-polls dictum was dead. What went wrong?
To modern pollsters, the answer is not hard to find. It’s implied by the Digest’s own description of how they picked the ten million voters to survey: “The mailing list is drawn from every telephone book in the United States, from the rosters of clubs and associations, from city directories, lists of registered voters, classified mail-order and occupational data.”
The year 1936 was in the midst of the Great Depression. There was a great division between rich and poor; the rich tended to vote Republican and the poor tended to vote Democrat. Phones were still not in the majority of households; those that had phones tended to be richer—and more Republican—than those that didn’t. Therefore, using the telephone directory as a way of generating a mailing list introduces a bias because it is a list enriched with Republicans at the expense of Democrats. The same is true of clubs and associations, especially automobile associations. These people leaned Republican too. Occupational data excludes the unemployed, who tend to vote Democrat. So by drawing their names from phone books, lists of club members, and occupational data, the editors of the Literary Digest were inadvertently reaching more Republicans than Democrats. Their sample was not truly representative of the voting population of the United States, creating a systematic error in their poll.
Unlike a statistical error, a systematic error like this doesn’t diminish as the sample size grows. It doesn’t matter whether your sample is a hundred or a thousand or ten million people: the error caused by a poor choice of sample stays large even as the margin of error shrinks to insignificance. The failure to recognize this source of error made the editors of the Literary Digest guilty of (ignorant) disestimation: they thought their sample size made their poll accurate to within a tiny fraction of a percent, when in fact it couldn’t be trusted within ten or fifteen points.
Even before the fiasco, more sophisticated