Superfreakonomics_ global cooling, patri - Steven D. Levitt [36]
What you’d really like to do is run a randomized, controlled trial so that when patients arrive they are randomly assigned to a doctor, even if that doctor is overwhelmed with other patients or not well equipped to handle a particular ailment.
But we are dealing with one set of real, live human beings who are trying to keep another set of real, live human beings from dying, so this kind of experiment isn’t going to happen, and for good reason.
Since we can’t do a true randomization, and if simply looking at patient outcomes in the raw data will be misleading, what’s the best way to measure doctor skill?
Thanks to the nature of the emergency room, there is another sort of de facto, accidental randomization that can lead us to the truth. The key is that patients generally have no idea which doctors will be working when they arrive at the ER. Therefore, the patients who show up between 2:00 and 3:00 P.M. on one Thursday in October are, on average, likely to be similar to the patients who show up the following Thursday, or the Thursday after that. But the doctors working on those three Thursdays will probably be different. So if the patients who came on the first Thursday have worse outcomes than the patients who came on the second or third Thursday, one likely explanation is that the doctors on that shift weren’t as good. (In this ER, there were usually two or three doctors per shift.)
There could be other explanations, of course, like bad luck or bad weather or an E. coli outbreak. But if you look at a particular doctor’s record across hundreds of shifts and see that the patients on those shifts have worse outcomes than is typical, you have a pretty strong indication that the doctor is at the root of the problem.
One last note on methodology: while we exploit information about which doctors are working on a shift, we don’t factor in which doctor actually treats a particular patient. Why? Because we know that the triage nurse’s job is to match patients with doctors, which makes the selection far from random. It might seem counterintuitive—wasteful, even—to ignore the specific doctor-patient match in our analysis. But in scenarios where selection is a problem, the only way to get a true answer is, paradoxically, to throw away what at first seems to be valuable information.
So, applying this approach to Craig Feied’s massively informative data set, what can we learn about doctor skill?
Or, put another way: if you land in an emergency room with a serious condition, how much does your survival depend on the particular doctor you draw?
The short answer is…not all that much. Most of what looks like doctor skill in the raw data is in fact the luck of the draw, the result of some doctors getting more patients with less-threatening ailments.
This isn’t to say there’s no difference between the best and worst doctors in the ER. (And no, we’re not going to name them.) In a given year, an excellent ER doctor’s patients will have a twelve-month death rate that is nearly 10 percent lower than the average. This may not sound like much, but in a busy ER with tens of thousands of patients, an excellent doctor might save six or seven lives a year relative to the worst doctor.
Interestingly, health outcomes are largely uncorrelated to spending. This means the best doctors don’t spend any more money—for tests, hospital admittance, and so on—than the lesser doctors. This is worth pondering in an era when higher health-care spending is widely thought to produce better health-care outcomes. In the United States, the health-care sector accounts for more than 16 percent of GDP, up from 5 percent in 1960, and is projected to reach 20 percent by 2015.
So what are the characteristics of the best