Classic Shell Scripting - Arnold Robbins [123]
Initializations elsewhere on the command line are done as the arguments are processed, and may be interspersed with filenames. For example:
awk '{...}' Pass=1 *.tex Pass=2 *.tex
processes the list of files twice, once with Pass set to one and a second time with it set to two.
Initializations with string values need not be quoted unless the shell requires such quoting to protect special characters or whitespace.
The special filename - (hyphen) represents standard input. Most modern awk implementations, but not POSIX, also recognize the special name /dev/stdin for standard input, even when the host operating system does not support that filename. Similarly, /dev/stderr and /dev/stdout are available for use within awk programs to refer to standard error and standard output.
* * *
[1] The GNU documentation reader, info, is part of the texinfo package available at ftp://ftp.gnu.org/gnu/texinfo/. The emacs text editor also can be used to access the same documentation: type Ctrl-H i in an emacs session to get started.
The awk Programming Model
awk views an input stream as a collection of records, each of which can be further subdivided into fields. Normally, a record is a line, and a field is a word of one or more nonwhitespace characters. However, what constitutes a record and a field is entirely under the control of the programmer, and their definitions can even be changed during processing.
An awk program consists of pairs of patterns and braced actions, possibly supplemented by functions that implement the details of the actions. For each pattern that matches the input, the action is executed, and all patterns are examined for every input record.
Either part of a pattern/action pair may be omitted. If the pattern is omitted, the action is applied to every input record. If the action is omitted, the default action is to print the matching record on standard output. Here is the typical layout of an awk program:
pattern { action } Run action if pattern matches
pattern
Print record if pattern matches
{ action } Run action for every record
Input is switched automatically from one input file to the next, and awk itself normally handles the opening, reading, and closing of each input file, allowing the user program to concentrate on record processing. The code details are presented later in Section 9.5.
Although the patterns are often numeric or string expressions, awk also provides two special patterns with the reserved words BEGIN and END.
The action associated with BEGIN is performed just once, before any command-line files or ordinary command-line assignments are processed, but after any leading -v option assignments have been done. It is normally used to handle any special initialization tasks required by the program.
The END action is performed just once, after all of the input data has been processed. It is normally used to produce summary reports or to perform cleanup actions.
BEGIN and END patterns may occur in any order, anywhere in the awk program. However, it is conventional to make the BEGIN pattern the first one in the program, and to make the END pattern the last one.
When multiple BEGIN or END patterns are specified, they are processed in their order in the awk program. This allows library code included with extra -f options to have startup and cleanup actions.
Program Elements
Like most scripting languages, awk deals with numbers and strings. It provides scalar and array variables to hold data, numeric and string expressions, and a handful of statement types to process data: assignments, comments, conditionals, functions, input, loops, and output. Many features of awk expressions and statements are purposely similar to ones in the C programming language.
Comments and Whitespace
Comments in awk run from sharp (#) to end-of-line, just like comments in the shell. Blank lines are equivalent to empty comments.
Wherever whitespace