Online Book Reader

Home Category

The Secret Life of Pronouns_ What Our Words Say About Us - James W. Pennebaker [121]

By Root 1093 0
that is the average of the two writers. If Lennon uses a low rate of we-words and McCartney uses a high rate, it would follow that their collaboration would produce a moderate number of we-words.


• The synergy hypothesis. Even more interesting is the idea that when two people work closely together, they create a product unlike either of them would on their own. Their language style will be distinctive in a way such that most people would not recognize who the author was. Wouldn’t it be great if the results supported this hypothesis? Come on, statistics, please, please me.

And the winner by a mile is, in fact, the synergy hypothesis. When Lennon and McCartney and when Madison and Hamilton were working together, they produced works that were strikingly different than works produced by the individual writers themselves. When collaborating, the Lennon-McCartney team produced lyrics that were much more positive, while using more I-words, fewer we-words, and much shorter words than either artist normally used on his own. Similarly, when Hamilton and Madison worked together they used much bigger words, more past tense, and fewer auxiliary verbs than either did on their own. In fact, across about seventy-five dimensions of language and punctuation, more than 90 percent of the time collaborations resulted in language that was either higher or lower than the language of the two writers on their own.

Note that collaborations produce quite different language patterns than what the individuals would naturally do on their own. What’s not yet known is if collaborative work is generally better than individual products. This is a research question that is begging to be answered.

SUMMING UP: PACKING YOUR AUTHOR IDENTIFICATION TOOL KIT

Author identification is becoming a very hot topic in the computer world. The three methods that we have relied on involve tracking the rate of function word usage, analyzing punctuation and layout, and examining the use of obscure words. Each of these methods does far better than chance in identifying characteristics of an author as well as matching the author’s writing to other writing samples.

In terms of understanding the author’s personality, we currently know the most about function words. As discussed throughout the book, pronouns, articles, and other stealth words have reliably been linked to the authors’ age, sex, social class, personality, and social connections. Less is currently known about punctuation and personality, but I suspect future research will begin demonstrating convincing links. After all, it’s hard to imagine that there isn’t a difference between the writer who writes at the end of his or her note, “Thanks.” versus one who writes, “Thanks!!!!!!!!!”

The least is known about the use of relatively obscure words and their link to personality. If one author uses intriguing and another remarkable, does the choice of the word itself say anything about the person?

There are also a number of other exciting methods being developed by labs around the world that are relevant to author identification. One strategy is to look at something called N-grams. These can be pairs of words (or bigrams), three words in a row (or trigrams), etc. Looking at the beginning of this paragraph, the bigram approach would look at the occurrence of “there are,” “are also,” “also a,” and so forth. The idea is that some people naturally use groups of words together in a unique way that identifies who they are.

More elaborate strategies attempt to mathematically predict word order within sentences based on the words the writer has already used. In the beginning of the last paragraph, the odds that it would start with the word there might be 1 in 1,000. The odds that the word are would be the second word, knowing that the first word is there, is perhaps 1 in 20. Knowing “There are,” the odds that the third word is also … you can get the idea. Researchers can determine how unique a person’s writing is and how much it deviates from chance on a sentence-by-sentence level. One argument is that every person

Return Main Page Previous Page Next Page

®Online Book Reader