Academic Legal Writing - Eugene Volokh [91]
But there's one problem: Every study (except one) that the book cited involved not randomly selected homosexual men, but men who were mostly or entirely drawn from samples of sexually transmitted disease patients, mostly patients with HIV. Most of the ellipses in the quote I give above (ellipses that were in the book itself) substitute for text that reveals this limitation in the data. For example, the source that the book quotes as saying “homosexual men ... reported a median of 1,160 lifetime sexual partners” actually said “homosexual men with AIDS reported a median of 1,160 lifetime sexual partners.”
Of course, people with sexually transmitted diseases are the very population that's likely to have had disproportionately high numbers of sexual partners, since having many sexual partners dramatically increases one's chances of getting infected. (You can get HIV just from one partner, but you're much likelier to get it if you have had a thousand partners.) Data from this subgroup of people tells us next to nothing about the practices of the median homosexual man generally.
Imagine a study that found that “People who drink alcohol and are dying of liver disease reported drinking a median of 10 drinks a day, compared with 1 drink a day for people with hepatitis who are dying of liver disease.” Would it be quite proper to report it as “People who drink alcohol ... reported drinking a median of 10 drinks a day, compared with 1 drink a day for people with hepatitis ....”?
The only cited study that tried to measure the behavior of male homosexuals generally is the one that yielded the lowest number, 49.5. Even this study, though, was conducted in the 1970s, before the AIDS epidemic hit; and it also involved a self-selected sample, which makes its results highly unreliable (see Part XVII.G.2, p. 162 for more on that). As I mentioned above, the best data that I've seen suggests that, as of 1991–2002, the median homosexual man in the U.S. has about 10 lifetime sexual partners, compared to 6 for the median heterosexual man—a nontrivial difference, but nothing like what the excerpt above reports.
So we see the danger of inferring from one population subgroup (American male homosexuals with sexually transmitted diseases) to another (American male homosexuals generally). And in this example the danger was exacerbated by the book's not admitting that it was drawing the inference: The book claimed that it was speaking about the broader group, while it was really speaking about the narrower one.
This shows the importance of what Part XXIII.A stressed—read, quote, and cite the original data, not just the intermediate source that reports on the data, even if the intermediate source looks like a scholarly work. It's tempting to just use the intermediate source's account, without checking the sources: For instance, the original sources cited by the book are in several medical journals that you'd have to get from another library. But if you relied on the book, your article would be badly wrong. You would be letting the book's errors become your errors.
And this again shows the importance of making clear to your readers the inferences that you're drawing from the data. Sometimes you do have to infer from one population to another: You can't infer from people with sexually transmitted diseases to people generally, but you might be able to draw inferences when the groups are more similar. But acknowledge to the readers that you are drawing such an inference, and explain why you think this inference is legitimate.
3. Inferring from one variable to another
Arguments often extrapolate from one variable to another. For instance, if you're trying to determine whether ice cream consumption is correlated to some variable, you might look at ice cream production data. Production data is apparently easier to get than sales data, and certainly than consumption