Classic Shell Scripting - Arnold Robbins [186]
awk programs that do a lot of pattern matching usually are limited by the complexity of that operation, which runs entirely at native speeds. Such programs can seldom be improved much by rewriting in a compiled language, like C or C++. Each of the three awk implementations that we mentioned in this chapter were written completely independently of one another, and thus may have quite different relative execution times for particular statements.
Because we have written lots of software tools in awk, some of which have been used on gigabytes of data, runtime efficiency has sometimes been important to us. A few years ago, one of us (NHFB) prepared pawk,[8] a profiling version of the smallest implementation, nawk. pawk reports both statement counts and times. Independently, the other (AR) added similar support with statement counts to GNU gawk so that pgawk is now standardly available from builds of releases of version 3.1.0 or later. pgawk produces an output profile in awkprof.out with a program listing annotated with statement execution counts. The counts readily identify the hot spots, and zero (or empty) counts identify code that has never been executed, so the profile also serves as a test coverage report. Such reports are important when test files are prepared to verify that all statements of a program are executed during testing: bugs are likely to lurk in code that is seldom, or never, executed.
Accurate execution timing has been harder to acquire because typical CPU timers have resolutions of only 60 to 100 ticks per second, which is completely inadequate in an era of GHz processors. Fortunately, some Unix systems now provide low-cost, nanosecond resolution timers, and pawk uses them on those platforms.
* * *
[6] Available at ftp://labrea.stanford.edu/pub/dict/words.gz.
[7] See http://www.tuhs.org/.
[8] Available at http://www.math.utah.edu/pub/pawk/.
Summary
The original spellchecking prototype shows the elegance and power of the Unix Software Tools approach. With only one special-purpose program, an afternoon's worth of work created a usable and useful tool. As is often the case, experience with a prototype in shell was then applied to writing a production version in C.
The use of a private dictionary is a powerful feature of Unix spell. Although the addition of locales to the Unix milieu introduced some quirks, dictionaries are still a valuable thing to use, and indeed, for each chapter of this book, we created private dictionaries to make spellchecking our work more manageable.
The freely available ispell and aspell programs are large and powerful, but lack some of the more obvious features to make their batch modes useful. We showed how with simple shell script wrappers, we could work around these deficiencies and adapt the programs to suit our needs. This is one of the most typical uses of shell scripting: to take a program that does almost what you need and modify its results slightly to do the rest of your job. This also fits in well with the "let someone else do the hard part" Software Tools principle.
Finally, the awk spellchecker nicely demonstrates the elegance and power of that language. In one afternoon, one of us (NHFB) produced a program of fewer than 200 lines that can be (and is!) used for production spellchecking.
Chapter 13. Processes
A process is an instance of a running program. New processes are started by the fork() and execve( ) system calls, and normally run until they issue an exit() system call. The details of the fork( ) and execve( ) system calls are complex and not needed for this book. Consult their manual pages if you want to learn more.
Unix systems have always supported multiple processes. Although the computer seems to be doing several things at once, in reality, this