Classic Shell Scripting - Arnold Robbins [128]
Arrays in awk require neither declaration nor allocation: array storage grows automatically as new elements are referenced. Array storage is sparse: only those elements that are explicitly referenced are allocated. This means that you can follow x[1] = 3.14159 with x[10000000] = "ten million", without filling in elements 2 through 9999999. Most programming languages with arrays require all elements to be of the same type, but that is not the case with awk arrays.
Storage can be reclaimed when elements are no longer needed. delete array[index] removes an element from an array, and recent awk implementations allow delete array to delete all elements. We describe another way to delete array elements at the end of Section 9.9.6.
A variable cannot be used as both a scalar and an array at the same time. Applying the delete statement removes elements of an array, but not its name: therefore, code like this:
x[1] = 123
delete x
x = 789
causes awk to complain that you cannot assign a value to an array name.
Sometimes, multiple indices are needed to uniquely locate tabular data. For example, the post office uses house number, street, and postal code to identify mail-delivery locations. A row/column pair suffices to identify a position in a two-dimensional grid, such as a chessboard. Bibliographies usually record author, title, edition, publisher, and year to identify a particular book. A clerk needs a manufacturer, style, color, and size to retrieve the correct pair of shoes from a stockroom.
awk simulates arrays with multiple indices by treating a comma-separated list of indices as a single string. However, because commas might well occur in the index values themselves, awk replaces the index-separator commas by an unprintable string stored in the built-in variable SUBSEP. POSIX says that its value is implementation-defined; generally, its default value is "\034" (the ASCII field-separator control character, FS), but you can change it if you need that string in the index values. Thus, when you write maildrop[53, "Oak Lane", "T4Q 7XV"], awk converts the index list to the string expression "53" SUBSEP "Oak Lane" SUBSEP "T4Q 7XV", and uses its string value as the index. This scheme can be subverted, although we do not recommend that you do so—these statements all print the same item:
print maildrop[53, "Oak Lane", "T4Q 7XV"]
print maildrop["53" SUBSEP "Oak Lane" SUBSEP "T4Q 7XV"]
print maildrop["53\034Oak Lane", "T4Q 7XV"]
print maildrop["53\034Oak Lane\034T4Q 7XV"]
Clearly, if you later change the value of SUBSEP, you will invalidate the indices of already-stored data, so SUBSEP really should be set just once per program, in the BEGIN action.
You can solve an astonishingly large number of data processing problems with associative arrays, once you rearrange your thinking appropriately. For a simple programming language like awk, they have shown themselves to be a superb design choice.
Command-Line Arguments
awk 's automated handling of the command line means that few awk programs need concern themselves with it. This is quite different from the C, C++, Java, and shell worlds, where programmers are used to handling command-line arguments explicitly.
awk makes the command-line arguments available via the built-in variables ARGC (argument count) and ARGV (argument vector, or argument values). Here is a short program to illustrate their use:
$ cat showargs.awk
BEGIN {
print "ARGC =", ARGC
for (k = 0; k < ARGC; k++)
print "ARGV[" k "] = [" ARGV[k] "]"
}
Here is what it produces for the general awk command line:
$ awk -v One=1 -v Two=2 -f showargs.awk Three=3 file1 Four=4 file2 file3
ARGC = 6
ARGV[0] = [awk]
ARGV[1] = [Three=3]
ARGV[2] = [file1]
ARGV[3] = [Four=4]
ARGV[4] = [file2]
ARGV[5] = [file3]
As in C and C++, the arguments are stored in array