Online Book Reader

Home Category

Drunkard's Walk - Leonard Mlodinow [72]

By Root 489 0
in the graphs on chapter 7. The second parameter determines the amount of spread in the curve. Though it didn’t receive its modern name until 1894, this measure is called the standard deviation, and it is the theoretical counterpart of the concept I spoke of earlier, the sample standard deviation. Roughly speaking, it is half the width of the curve at the point at which the curve is about 60 percent of its maximum height. Today the importance of the normal distribution stretches far beyond its use as an approximation to the numbers in Pascal’s triangle. It is, in fact, the most widespread manner in which data have been found to be distributed.

When employed to describe the distribution of data, the bell curve describes how, when you make many observations, most of them fall around the mean, which is represented by the peak of the curve. Moreover, as the curve slopes symmetrically downward on either side, it describes how the number of observations diminishes equally above and below the mean, at first rather sharply and then less drastically. In data that follow the normal distribution, about 68 percent (roughly two-thirds) of your observations will fall within 1 standard deviation of the mean, about 95 percent within 2 standard deviations, and 99.7 percent within 3.

The bars in the graphs above represent the relative magnitudes of the entries in the 10th, 100th, and 1,000th rows of Pascal’s triangle (see chapter 04). The numbers along the horizontal axis indicate to which entry the bar refers. By convention, that labeling begins at 0, rather than 1 (the middle and bottom graphs have been truncated so that the entries whose bars would have negligible height are not shown).

In order to visualize this, have a look at the graph on chapter 07. In this table the data marked by squares concern the guesses made by 300 students, each observing a series of 10 coin flips.26 Along the horizontal axis is plotted the number of correct guesses, from 0 to 10. Along the vertical axis is plotted the number of students who achieved that number of correct guesses. The curve is bell shaped, centered at 5 correct guesses, at which point its height corresponds to about 75 students. The curve falls to about two-thirds of its maximum height, corresponding to about 51 students, about halfway between 3 and 4 correct guesses on the left and between 6 and 7 on the right. A bell curve with this magnitude of standard deviation is typical of a random process such as guessing the result of a coin toss.

The same graph also displays another set of data, marked by circles. That set describes the performance of 300 mutual fund managers. In this case the horizontal axis represents not correct guesses of coin flips but the number of years (out of 10) that a manager performed above the group average. Note the similarity! We’ll get back to this in chapter 9.

A good way to get a feeling for how the normal distribution relates to random error is to consider the process of polling, or sampling. You may recall the poll I described in chapter 5 regarding the popularity of the mayor of Basel. In that city a certain fraction of voters approved of the mayor, and a certain fraction disapproved. For the sake of simplicity we will now assume each was 50 percent. As we saw, there is a chance that those involved in the poll would not reflect exactly this 50/50 split. In fact, if N voters were questioned, the chances that any given number of them would support the mayor are proportional to the numbers on line N of Pascal’s triangle. And so, according to De Moivre’s work, if pollsters poll a large number of voters, the probabilities of different polling results can be described by the normal distribution. In other words about 95 percent of the time the approval rating they observe in their poll will fall within 2 standard deviations of the true rating, 50 percent. Pollsters use the term margin of error to describe this uncertainty. When pollsters tell the media that a poll’s margin of error is plus or minus 5 percent, they mean that if they were to repeat the poll a

Return Main Page Previous Page Next Page

®Online Book Reader