Classic Shell Scripting - Arnold Robbins [135]
for (name in telephone)
print name "\t" telephone[name]
is unlikely to be in the order that you want. We show how to solve that problem in Section 9.7.7. The split() function, described in Section 9.9.6, handles the case of multiply-indexed arrays.
As in the shell, the break statement exits the innermost loop prematurely:
for (name in telephone)
if (telephone[name] = = "555-0136")
break
print name, "has telephone number 555-0136"
However, the shell-style multilevel break n statement is not supported.
Just like in the shell, the continue statement jumps to the end of the loop body, ready for the next iteration. awk does not recognize the shell's multilevel continue n statement. To illustrate the continue statement, the program in Example 9-1 determines by brute-force testing of divisors whether a number is composite or prime (recall that a prime number is any whole number larger than one that has no integral divisors other than one and itself), and prints any factorization that it can find.
Example 9-1. Integer factorization
# Compute integer factorizations of integers supplied one per line.
# Usage:
# awk -f factorize.awk
{
n = int($1)
m = n = (n >= 2) ? n : 2
factors = ""
for (k = 2; (m > 1) && (k^2 <= n); )
{
if (int(m % k) != 0)
{
k++
continue
}
m /= k
factors = (factors = = "") ? ("" k) : (factors " * " k)
}
if ((1 < m) && (m < n))
factors = factors " * " m
print n, (factors = = "") ? "is prime" : ("= " factors)
}
Notice that the loop variable k is incremented, and the continue statement executed, only when we find that k is not a divisor of m, so the third expression in the for statement is empty.
If we run it with suitable test input, we get this output:
$ awk -f factorize.awk test.dat
2147483540 = 2 * 2 * 5 * 107374177
2147483541 = 3 * 7 * 102261121
2147483542 = 2 * 3137 * 342283
2147483543 is prime
2147483544 = 2 * 2 * 2 * 3 * 79 * 1132639
2147483545 = 5 * 429496709
2147483546 = 2 * 13 * 8969 * 9209
2147483547 = 3 * 3 * 11 * 21691753
2147483548 = 2 * 2 * 7 * 76695841
2147483549 is prime
2147483550 = 2 * 3 * 5 * 5 * 19 * 23 * 181 * 181
Array Membership Testing
The membership test key in array is an expression that evaluates to 1 (true) if key is an index element of array. The test can be inverted with the not operator: !( key in array ) is 1 if key is not an index element of array; the parentheses are mandatory.
For arrays with multiple subscripts, use a parenthesized comma-separated list of subscripts in the test: ( i, j , ..., n ) in array.
A membership test never creates an array element, whereas referencing an element always creates it, if it does not already exist. Thus, you should write:
if ("Sally" in telephone)
print "Sally is in the directory"
rather than:
if (telephone["Sally"] != "")
print "Sally is in the directory"
because the second form installs her in the directory with an empty telephone number, if she is not already there.
It is important to distinguish finding an index from finding a particular value. The index membership test requires constant time, whereas a search for a value takes time proportional to the number of elements in the array, illustrated by the for loop in the break statement example in the previous section. If you need to do both of these operations frequently, it is worthwhile to construct an inverted-index array:
for (name in telephone)
name_by_telephone[telephone[name]] = name
You can then use name_by_telephone["555-0136"] to find "Carol" in constant time. Of course, this assumes that all values are unique: if two people share a telephone, the name_by_telephone array records only the last name stored. You can solve that problem with just a bit more code:
for (name in telephone)
{
if (telephone[name] in name_by_telephone)
name_by_telephone[telephone[name]] = \
name_by_telephone[telephone[name]] "\t" name
else
name_by_telephone[telephone[name]]