Everything Is Obvious_ _Once You Know the Answer - Duncan J. Watts [47]
Finally, although many tweets are mundane updates (“Having coffee at Starbucks on Broadway! It’s a beautiful day!!”), many of them refer either to other online content, like breaking news stories and funny videos, or to other things in the world, like books, movies, and so on, about which Twitter users wish to express their opinions. And because the format of Twitter forces users to keep every message to no more than 140 characters, users often make use of “URL shorteners,” such as bit.ly, to replace the long, messy URL of the original website with something like http://bit.ly/beRKJo. The nice thing about these shortened URLs is that they effectively assign a unique code to every piece of content broadcast on Twitter. Thus when a user wishes to “retweet” something, it’s possible to see whom it came from originally, and thereby trace chains of diffusion across the follower graph.
In total, we tracked more than 74 million of these diffusion chains initiated by more than 1.6 million users, over a two-month interval in late 2009. For each event, we counted how many times the URL in question was retweeted—first by the original “seed” user’s immediate followers, then by their followers, and their followers’ followers, and so on—thereby tracing out the full “cascade” of retweets triggered by each original tweet. As the figure on this page shows, some of these cascades were broad and shallow, while others were narrow and deep. Others still were very large, with complex structure, starting out small and trickling along before gaining momentum somewhere else in the network. Most of all, however, we found that the vast majority of attempted cascades—roughly 98 percent of the total—didn’t actually spread at all.
Cascades on Twitter
This result is important because, as I’ll discuss in more detail in the next chapter, if you want to understand why some things “go viral”—those occasional YouTube videos that attract millions of downloads, or funny messages that circulate wildly through e-mail or on Facebook—it’s a mistake to consider only the rare few that actually succeed. In most settings, unfortunately, it is only possible to study the “successes” for the simple reason that nobody bothers to keep track of all the failures, which have a tendency to get swept under the rug. On Twitter, however, we can keep track of every single event, no matter how small, thereby enabling us to learn who is influential, how much more influential than average they really are, and whether or not it is possible to tell the differences between individuals in a way that could potentially be exploited.
The way we went about this exercise was to imitate what a hypothetical marketer might try to do—that is, using everything known about the attributes and past performance of a million or so individuals, to predict how influential each of them will be in the future. Based on these predictions, the marketer could then “sponsor” some group of individuals to tweet whatever information it is trying to disseminate, thereby generating a series of cascades. The better the marketer can predict how large a cascade any particular individual can trigger, the more efficiently