Story of Psychology - Morton Hunt [141]
Galton had been led to consider this problem by an odd finding in his studies of hereditary genius: the children of unusual parents were generally less unusual. In terms of physical traits, for instance, the children of tall parents tended to be less tall, though still above average, and the children of short parents not as short, though still below average, a tendency Galton called “regression towards mediocrity” (later, the term became “regression towards the mean”). He wanted to know what it indicated about the strength of heredity and how he could express it mathematically. On the face of it, this seemed a purely intellectual puzzle; as it turned out, the solution to the problem would become one of the most useful research tools in psychology and many other sciences.
After pondering the matter for a long while, Galton set down a “scatter plot” of the heights of some three hundred children. First he created a grid, the horizontal dimension of which was children’s heights and the vertical dimension of which was parents’ heights (actually, the heights of “mid-parents”—the average of each parental pair). Then, in each cell of the grid (each intersection of a particular children’s height and a particular parental height) he wrote down the number of children who fit that category. The scatter plot looked like this:
For a time, it revealed nothing to him; then one morning, poring over it while waiting for a train, he suddenly saw a regularity in the numbers. If he drew a line connecting any set of approximately equal values, it would describe a tipped-over ellipse whose center point was the midpoint of the scatter plot (the averages for both parents and children). When he did so and then drew lines across the ellipse connecting its extreme horizontal and vertical points, they passed through the average height of children in each vertical column and the average height of parents in each horizontal row. It looked like this:
The ellipse and the lines crossing it revealed the relationship he had been looking for. At any given parental height (“Locus of horizontal tangential points”), the average height of the children was only about two-thirds as far from the mean (average) as that of the parents; that is, the children had “regressed” a third of the way toward the mean.14 Conversely, for any children’s height (“Locus of vertical tangential points”), parents were somewhat closer to the mean (that is, parents of unusual children were less unusual than their children).
Galton had discovered the analytical device of the “regression line.” If the children’s heights had been exactly the same as the parents’, the two regression lines would have coincided; if the children’s heights had no relation whatever to the parents’, the regression lines would have been perpendicular to each other. As it was, they were fairly close, meaning that the relation between the two variables in this case—their correlation— was about midway between total and nil.
That was in 1886. Ten years later the British biometrician Karl Pearson, a Galton disciple and later his biographer, worked out a mathematical means of calculating the “coefficient of correlation”—which he called r, for regression—without any need to create scatter plots. For any two sets of data, it would show a correlation ranging from 1 (a perfect one-to-one covariation) to 0 (no relationship whatever) and to −1 (a totally inverse relationship). The Pearsonian method has been the standard way of evaluating correlation to this day. In the case of parents and children, r turned out to be .47 (somewhat different from Galton’s first calculations): that is, children averaged about half as far from the population’s average as their parents.15
The importance of Galton’s discovery of correlation analysis can hardly be overestimated. It meant that whenever two variables change in the same direction (or the opposite direction), even though not to the same degree, they are correlated, and