Classic Shell Scripting - Arnold Robbins [139]
As in most programming languages, awk functions can call themselves: this is known as recursion. Obviously, the programmer must make some provision for eventual termination: this is usually done by making the job smaller for each successive invocation so that at some point, no further recursion is needed. Example 9-3 shows a famous example from elementary number theory that uses a method credited to the Greek mathematician Euclid (ca. 300 BCE), but probably known at least 200 years earlier, to find the greatest common denominator of two integers.
Example 9-3. Euclid's greatest common denominator algorithm
function gcd(x, y, r)
{
# return the greatest common denominator of integer x, y
x = int(x)
y = int(y)
# print x, y
r = x % y
return (r = = 0) ? y : gcd(y, r)
}
If we add this action
{ g = gcd($1, $2); print "gcd(" $1 ", " $2 ") =", g }
to the code in Example 9-3 and then we uncomment the print statement and run it from a file, we can see how the recursion works:
$ echo 25770 30972 | awk -f gcd.awk
25770 30972
30972 25770
25770 5202
5202 4962
4962 240
240 162
162 78
78 6
gcd(25770, 30972) = 6
Euclid's algorithm always takes relatively few steps, so there is no danger of overflowing the call stack inside awk that keeps track of the nested function-call history. However, that is not always the case. There is a particularly nasty function discovered by the German mathematician Wilhelm Ackermann[3] in 1926 whose value, and recursion depth, grow much faster than exponentially. It can be defined in awk with the code in Example 9-4.
Example 9-4. Ackermann's worse-than-exponential function
function ack(a, b)
{
N++ # count recursion depth
if (a = = 0)
return (b + 1)
else if (b = = 0)
return (ack(a - 1, 1))
else
return (ack(a - 1, ack(a, b - 1)))
}
If we augment it with a test action:
{ N = 0; print "ack(" $1 ", " $2 ") = ", ack($1, $2), "[" N " calls]" }
and run it from a test file, we find:
$ echo 2 2 | awk -f ackermann.awk
ack(2, 2) = 7 [27 calls]
$ echo 3 3 | awk -f ackermann.awk
ack(3, 3) = 61 [2432 calls]
$ echo 3 4 | awk -f ackermann.awk
ack(3, 4) = 125 [10307 calls]
$ echo 3 8 | awk -f ackermann.awk
ack(3, 8) = 2045 [2785999 calls]
ack(4, 4) is completely uncomputable.
* * *
[3] See http://mathworld.wolfram.com/AckermannFunction.html for background and history of the Ackermann function.
String Functions
In Section 9.3.2 we introduced the length( string ) function, which returns the length of a string string. Other common string operations include concatenation, data formatting, lettercase conversion, matching, searching, splitting, string substitution, and substring extraction.
Substring Extraction
The substring function, substr( string, start, len ), returns a copy of the substring of len characters from string starting from character start. Character positions are numbered starting from one: substr("abcde", 2, 3) returns "bcd". The len argument can be omitted, in which case, it defaults to length( string ) - start + 1, selecting the remainder of the string.
It is not an error for the arguments of substr( ) to be out of bounds, but the result may be implementation-dependent. For example, nawk and gawk evaluate substr("ABC", -3, 2) as "AB", whereas mawk produces the empty string "". All of them produce an empty string for substr("ABC", 4, 2) and for substr("ABC", 1, 0). gawk's —lint option diagnoses out-of-bounds arguments in substr( ) calls.
Lettercase Conversion
Some alphabets have uppercase and lowercase forms of each letter, and in string searching and matching, it is often desirable to ignore case differences. awk provides two functions for this purpose: tolower( string ) returns a copy of string with all characters replaced by their lowercase equivalents, and toupper( string ) returns a copy with uppercase equivalents. Thus, tolower("aBcDeF123") returns "abcdef123", and toupper("aBcDeF123") returns "ABCDEF123". These functions are fine for ASCII letters, but they do not correctly case-convert accented