Complexity_ A Guided Tour - Melanie Mitchell [115]
Several different research groups have found that the Web’s in-degree distribution can be described by a very simple rule: the number of pages with a given in-degree is approximately proportional to 1 divided by the square of that in-degree. Suppose we denote the in-degree by the letter k. Then
Number of Web pages with in-degree k is proportional to .
(There has been some disagreement in the literature as to the actual exponent on k but it is close to 2—see the notes for details.) It turns out that this rule actually fits the data only for values of in-degree (k) in the thousands or greater.
To demonstrate why the Web is called “scale free,” I’ll plot the in-degree distribution as defined by this simple rule above, at three different scales. These plots are shown in figure 15.6. The first graph (top) plots the distribution for 9,000 in-degrees, starting at 1,000, which is close to where the rule becomes fairly accurate. Similar to figure 15.3, the in-degree values between 1,000 and 10,000 are shown on the horizontal axis, and their frequency (number of pages with the given in-degree) by the height of the boxes along the vertical axis. Here there are so many boxes that they form a solid black region.
The plots don’t give the actual values for frequency since I want to focus on the shape of the graph (not to mention that as far as I know, no one has very good estimates for the actual frequencies). However, you can see that there is a relatively large number of pages with k = 1,000 in-links, and this frequency quickly drops as in-degree increases. Somewhere between k = 5,000 and k = 10,000, the number of pages with k in-links is so small compared with the number of pages with 1,000 in-links that the corresponding boxes have essentially zero height.
What happens if we rescale—that is, zoom in on—this “near-zero-height” region? The second (middle) graph plots the in-degree distribution from k = 10,000 to k = 100,000. Here I’ve rescaled the plot so that the k = 10,000 box on this graph is at the same height as the k = 1,000 box on the previous graph. At this scale, there is now a relatively large number of pages with k = 10,000 in-links, and now somewhere between k = 50,000 and k = 100,000 we get “near-zero-height” boxes.
FIGURE 15.6. Approximate shape of the Web’s in-degree distribution at three different scales.
But something else is striking—except for the numbers on the horizontal axis, the second graph is identical to the first. This is true even though we are now plotting the distribution over 90,000 values instead of 9,000—what scientists call an order of magnitude more.
The third graph (bottom) shows the same phenomenon on an even larger scale. When we plot the distribution over k from 100,000 to 1 million, the shape remains identical.
A distribution like this is called self-similar, because it has the same shape at any scale you plot it. In more technical terms, it is “invariant under rescaling.” This is what is meant by the term scale-free. The term self-similarity might be ringing a bell. We saw it back in chapter 7, in the discussion of fractals. There is indeed a connection to fractals here; more on this in chapter 17.
SCALE-FREE DISTRIBUTIONS VERSUS BELL CURVES
Scale-free networks are said to have no “characteristic scale.” This is best explained by comparing a scale-free distribution with another well-studied distribution, the so-called bell-curve.
Suppose I plotted the distribution of adult human heights in the world. The smallest (adult) person in the world is a little over 2 feet tall (around 70 cm). The tallest person is somewhere close to 9 feet tall (around