The Secret Life of Pronouns_ What Our Words Say About Us - James W. Pennebaker [5]
Thanks to Martha’s programming skills and the thousands of hours spent by our student judges, LIWC was eventually ready to go. The final program instantly analyzed computer-based text or document files and calculated the percentage of words associated with each dictionary. The most recent version of LIWC can analyze thousands of individual digital files in a matter of seconds. Although our initial studies all focused on trauma essays, we eventually moved to poems, novels, blogs, Twitter feeds, letters, IMs, transcripts of conversations, and any other documents that contain words.
To appreciate how a word-counting program works, let’s look at the first two sentences of Lewis Carroll’s Alice’s Adventures in Wonderland:
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversation?”
So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.
LIWC would begin its analysis by first counting all the words in the text, which, in this case, is 113. It would then look at each word separately to determine if it was included in any of the existing dictionaries. So, for example, LIWC would first see the word Alice but would find no such word in any of its dictionaries. It would then move to the word was. Voilà! The word was would be in several dictionaries, including the verb dictionary, the auxiliary-verb dictionary, and the past-tense verb dictionary. The count for each of those dictionaries would now be 1. As LIWC proceeded in its task, it would then locate the next word, beginning, in the time dictionary; to in the preposition dictionary; and so forth. Finally, after evaluating all 113 words in the text and assigning each of them to the relevant dictionaries, LIWC would then calculate the percentage of total words that are linked to each dictionary. So, for example, in this passage, about 7 percent of all the words are personal pronouns, 9 percent are articles, and 3.6 percent are words related to emotion.
In analyzing a text, LIWC had many advantages over my troublesome human experts. Programs such as LIWC are 100 percent reliable in that you get the same results every time you run the program on a particular text. They are very fast, able to analyze the collected works of Shakespeare in under twenty seconds. And the results from the analysis of one person’s text can be directly compared with those of anyone else’s.
Despite these admirable features, word counting programs are also remarkably stupid. They can’t detect irony or sarcasm and are singularly lacking in a sense of humor. Particularly damning is that they fail to capture the context of language. One word, for example, can have very different meanings depending on how it is used.
Consider the word mad. The LIWC program counts mad in its anger and negative-emotion dictionaries. If someone said, “I’m mad at you for kissing my new boyfriend,” LIWC’s interpretation of mad as an anger word would no doubt be correct. But if the same person said, “I’m mad about my new boyfriend,” LIWC would be mistaken to sort mad according to its given definitions. In this case, mad means not angry but “crazily happy.” Or, returning to Alice in Wonderland, the Mad Tea Party was not a hostile affair. Rather, mad in this context simply means “peculiar” or “insane.”
LIWC, like almost all word-counting systems, makes lots of errors. It is a probabilistic system. Sometimes it classifies correctly and sometimes it doesn’t. We have now run enough studies to determine that statistically it is usually correct, and the good news is that the more words there are available to analyze,