Academic Legal Writing - Eugene Volokh [89]
We often have to act based on inferences from correlation to causation. Whenever a change in educational policy or policing policy, for instance, is followed by rising test scores or by falling crime, people naturally notice, and think about trying to repeat the experiment elsewhere. When they see the correlation in several places at several times, they reasonably infer that the change is probably good.
The inference may be far from certain, but this is the way practical reasoning necessarily works. It's how we run our daily lives, and it's how we often have to do policymaking as well. If you study statistical methods, you'll learn various ways of drawing such inference more reliably, through multiple regressions or other devices that can eliminate the effects of some obvious confounding factors (such as month, year, location, or other factors).
But for now, the important points are that (1) you must always understand when your sources infer from correlation to causation, and (2) you must always make clear to your readers when you make such an inference yourself. When you read a claim that “the tax cut caused economic growth,” check: Does the author's data actually show causation, or only correlation (i.e., that the tax cut was followed by economic growth)? If so, then recognize that concluding that the tax cut actually caused growth requires an inference, one which may not be accurate.
And when you make a similar claim yourself, make clear that the tax cut was simply followed by economic growth. This should alert the reader that the data simply shows correlation and not causation. And it should also remind you of the same thing, and prompt you to explain why you think this particular correlation does indeed show that the tax cut did cause economic growth—rather than, for instance, coming when the economy was about to start growing in any event.
2. Extrapolating across places, times, or populations
a. Generally
People often draw inferences based on data from a different time, a different place, or a different population subgroup. Consider, for instance, the following table from the 1990 edition of a leading college textbook on sexuality, which reports that the median homosexual man has had 250–499 sexual partners in his lifetime:
Only if one looks closely at the source citation does one get the sense that the data is pretty old (the copyright date is 1978, though it turns out the study was conducted in 1970). And only if one actually goes to the Bell & Weinberg book does one see that it was conducted in only one city, San Francisco. The number of lifetime sexual partners that the median American homosexual man had in 1990, in the midst of the AIDS epidemic, might well have been different from the number in 1970. Nor can one reliably generalize from San Francisco to other cities, where both sexual mores and the number of potential partners might be quite different.
As it happens, the most serious problem with the original study is that it was based on a largely self-selected sample, see Part XVII.G.2, p. 162, so it was an unreliable estimate even of the behavior of the median homosexual in 1970 San Francisco. The college textbook noted this limitation three pages before, but still prominently reported the data.
But even had the study been based on a representative sample of homosexual men in 1970 San Francisco, it may not have been representative of all homosexual American men in 1990. And the textbook erred in labeling the data as “Sexual Partnerships Among Homosexuals” generally, rather than “Sexual Partnerships Among a Self-Selected Sample of Homosexuals in 1970 San Francisco.”
b. Extrapolating across places
So when you're reading a claim about the behavior of a large group, look closely at the data on which the claim rests. Is the data really about the large group as a whole? Or was it gathered only in one particular area?
Especially when the data is hard to gather—nationwide studies are often much more expensive and time-consuming than local