Beautiful Code [321]
Speller is treated as a special case. Its contents are not dynamic. However, the large number of leaf nodes it contains, besides the fact that it contains every word in the vocabulary tree, means it could not be indexed either. It is populated only as needed—i.e., the children of a node in the speller subtree are created only when it is selected.
30.2.4. Simple Typing
The Type subtree contains three nodes that help you do plain typing. Under Speller appear all the letters from a through z, which allow you to pick the first letter of the word you desire. You are then presented similar choices for the next letter, but only if that combination occurs at the start of a word in the dictionary. In this way, you pick letter by letter, until you have the full word. At this point, the node at which you find yourself may or may not be a leaf node. If it is a leaf node, you can type it by simply clicking it. But often it is not.
"Vocabularytree" and "commonwords," described later, are other nodes that make it easy for you to type. However, if the system's prediction feature is working well, which happens if you are trying to make a sentence similar to one in the database, you do not need these facilities often.
30.2.5. Prediction: Word Completion and Next Word
In the predictor database are several tables. One is a simple list of roughly 250,000 words used to populate the Word Completion subtree. A user who has typed one or more starting characters of a word can use this list to type the rest of the word, suggestions for which are shown to the right of the screen, in the above half, as shown in Figure 30-1. This table is available to the user in its entirety via the Speller subtree.
Say you wish to type the word instant. This is not a leaf node because words such as instantaneous exist that begin with instant. Hence, to type instant, you select each of the seven characters in turn, and then when instant is highlighted, you use a long click to invoke the Type This option.
Another table has the fields word1, word2, and frequency. To populate this table, a long list of sentences are provided to a piece of companion software, dbmanager, which tabulates how often each word follows each other word. Once you have typed a word, this table is queried and the Next Word subtree populated, so that it provides the user a list of words that are likely to follow this one.
Each sentence entered by the user through eLocutor is copied into the file mailtomehtaatvsnldotcom.txt. The reason for this filename was to gently encourage the user to mail me samples of text he had generated using eLocutor, so that I might get some ideas about how to make it more efficient. Users are advised to edit this file and remove whatever is inappropriate before feeding it to dbmanager, so that with time, prediction gets better. In case a software writer wishes to implement a better method of predicting the next word, all she has to do is to alter the query in the Access database; there is no need to delve into the eLocutor code for this.
A separate table lists combinations of punctuation characters occurring in the text supplied to the database, which are treated by eLocutor more or less as words.
It is hard for software to predict what the user might wish to type next, without a knowledge of semantics. We tried talking to linguists to see whether there was a reasonably easy way to make such predictions, but soon gave up. What we did instead was laboriously combine words into semantic groups under the "Vocabulary tree" subtree. For instance, the ancestry of "Boston" in the vocabulary tree is Nouns Places Cities. Of course, the user can use this subtree to actually type in words, but that isn't very convenient. The semantic subgroups are better