Classic Shell Scripting - Arnold Robbins [129]
$ awk 'BEGIN { for (k = 0; k < ARGC; k++)
> print "ARGV[" k "] = [" ARGV[k] "]" }' a b c
ARGV[0] = [awk]
ARGV[1] = [a]
ARGV[2] = [b]
ARGV[3] = [c]
Whether a directory path in the program name is visible or not is implementation-dependent:
$ /usr/local/bin/gawk 'BEGIN { print ARGV[0] }'
gawk
$ /usr/local/bin/mawk 'BEGIN { print ARGV[0] }'
mawk
$ /usr/local/bin/nawk 'BEGIN { print ARGV[0] }'
/usr/local/bin/nawk
The awk program can modify ARGC and ARGV, although it is rarely necessary to do so. If an element of ARGV is (re)set to an empty string, or deleted, awk ignores it, instead of treating it as a filename. If you eliminate trailing entries of ARGV, be sure to decrement ARGC accordingly.
awk stops interpreting arguments as options as soon as it has seen either an argument containing the program text, or the special — option. Any following arguments that look like options must be handled by your program and then deleted from ARGV, or set to an empty string.
It is often convenient to wrap the awk invocation in a shell script. To keep the script more readable, store a lengthy program in a shell variable. You can also generalize the script to allow the awk implementation to be chosen at runtime by an environment variable with a default of nawk:
#! /bin/sh -
AWK=${AWK:-nawk}
AWKPROG='
... long program here ...
'
$AWK "$AWKPROG" "$@"
Single quotes protect the program text from shell interpretation, but more care is needed if the program itself contains single quotes. A useful alternative to storing the program in a shell variable is to put it in a separate file in a shared library directory that is found relative to the directory where the script is stored:
#! /bin/sh -
AWK=${AWK:-nawk}
$AWK -f `dirname $0`/../share/lib/myprog.awk -- "$@"
The dirname command was described in Section 8.2. For example, if the script is in /usr/local/bin, then the program is in /usr/local/share/lib. The use of dirname here ensures that the script will work as long as the relative location of the two files is preserved.
Environment Variables
awk provides access to all of the environment variables as entries in the built-in array ENVIRON:
$ awk 'BEGIN { print ENVIRON["HOME"]; print ENVIRON["USER"] }'
/home/jones
jones
There is nothing special about the ENVIRON array: you can add, delete, and modify entries as needed. However, POSIX requires that subprocesses inherit the environment in effect when awk was started, and we found no current implementations that propagate changes to the ENVIRON array to either subprocesses or built-in functions. In particular, this means that you cannot control the possibly locale-dependent behavior of string functions, like tolower( ), with changes to ENVIRON["LC_ALL"]. You should therefore consider ENVIRON to be a read-only array.
If you need to control the locale of a subprocess, you can do so by setting a suitable environment variable in the command string. For example, you can sort a file in a Spanish locale like this:
system("env LC_ALL=es_ES sort infile > outfile")
The system( ) function is described later, in Section 9.7.8.
Records and Fields
Each iteration of the implicit loop over the input files in awk's programming model processes a single record, typically a line of text. Records are further divided into smaller strings, called fields.
Record Separators
Although records are normally text lines separated by newline characters, awk allows more generality through the record-separator built-in variable, RS.
In traditional and POSIX awk, RS must be either a single literal character, such as newline (its default value), or an empty string. The latter is treated specially: records are then paragraphs separated by one or more blank lines, and empty lines at the start or end of a file are ignored. Fields are then separated