Classic Shell Scripting - Arnold Robbins [176]
ispell and aspell
There are two different, freely available spellchecking programs: ispell and aspell. ispell is an interactive spellchecker; it displays your file, highlighting any spelling errors and providing suggested changes. aspell is a similar program; for English it does a better job of providing suggested corrections, and its author would like it to eventually replace ispell. Both programs can be used to generate a simple list of misspelled words, and since aspell hopes to replace ispell, they both use the same options:
-l
Print a list of misspelled words on standard output.
-p file
Use file as a personal dictionary of correctly spelled words. This is similar to Unix spell's personal file that starts with a +.
The ispell home page is http://ficus-www.cs.ucla.edu/geoff/ispell.html, and the source may be found at ftp://ftp.gnu.org/gnu/non-gnu/ispell/.[5] The aspell home page is http://aspell.net/, and the source is at ftp://ftp.gnu.org/gnu/aspell/.
Both programs provide basic batch spellchecking. They also share the same quirk, which is that their results are not sorted, and duplicate bad words are not suppressed. (Unix spell has neither of these problems.) Thus, one prominent GNU/Linux vendor has the following shell script in /usr/bin/spell:
#!/bin/sh
# aspell -l mimicks the standard unix spell program, roughly.
cat "$@" | aspell -l --mode=none | sort -u
The —mode option causes aspell to ignore certain kinds of markup, such as SGML and TEX. Here, --mode=none indicates that no filtering should be done. The sort -u command sorts the output and suppresses duplicates, producing output of the nature expected by an experienced Unix user. This could also be done using ispell:
cat "$@" | ispell -l | sort -u
We could enhance this script in two different ways to provide a personal dictionary the same way Unix spell does. The first replacement spell script is provided in Example 12-1.
Example 12-1. A spell replacement using ispell
#!/bin/sh
# Unix spell treats a first argument of `+file' as providing a
# personal spelling list. Let's do that too.
mydict=
case $1 in
+?*) mydict=${1#+} # strip off leading +
mydict="-p $mydict"
shift
;;
esac
cat "$@" | ispell -l $mydict | sort -u
This works by simply looking for a first argument that begins with +, saving it in a variable, stripping off the + character, and then prepending the -p option. This is then passed on to the ispell invocation.
Unfortunately, this same technique does not work with aspell: it wants its dictionaries to be in a compiled binary format. To use aspell, we instead resort to the fgrep program, which can match multiple strings provided in a file. We add the -v option, which causes fgrep to print lines that do not match. The second replacement spell script is provided in Example 12-2.
Example 12-2. A spell replacement using aspell
#!/bin/sh
# Unix spell treats a first argument of `+file' as providing a
# personal spelling list. Let's do that too.
mydict=cat
case $1 in
+?*) mydict=${1#+} # strip off leading +
mydict="fgrep -v -f $mydict"
shift
;;
esac
# aspell -l mimics the standard Unix spell program, roughly.
cat "$@" | aspell -l --mode=none | sort -u | eval $mydict
This same trick of post-processing with fgrep can be used with Unix spell if you do not want to have to keep your personal dictionary sorted, or if you do not want to have to worry about different locales' sorting order.
The next section presents an awk version of spell, which provides a simple yet powerful alternative to the various spell replacements discussed here.
* * *
[4] The spell(1) manual page, in the BUGS section, has long noted that "British spelling was done by an American."
[5] emacs uses ispell for interactive spellchecking. This is fast, since ispell is kept running in the background.
A Spellchecker in awk
In this section, we present a program for checking spelling. Even though all Unix systems have spell, and many also have aspell or ispell, it