Classic Shell Scripting - Arnold Robbins [138]
system("cat < It produces the output expected when copying the here document to standard output: uno dos tres Because each call to system( ) starts a fresh shell, there is no simple way to pass data between commands in separate calls to system( ), other than via intermediate files. There is an easy solution to this problem—use an output pipeline to the shell to send multiple commands: shell = "/usr/local/bin/ksh" print "export INPUTFILE=/var/tmp/myfile.in" | shell print "export OUTPUTFILE=/var/tmp/myfile.out" | shell print "env | grep PUTFILE" | shell close(shell) This approach has the added virtue that you get to choose the shell, but has the drawback that you cannot portably retrieve the exit-status value. User-Defined Functions The awk statements that we have covered so far are sufficient to write almost any data processing program. Because human programmers are poor at understanding large blocks of code, we need a way to split such blocks into manageable chunks that each perform an identifiable job. Most programming languages provide this ability, through features variously called functions, methods, modules, packages, and subroutines. For simplicity, awk provides only functions. As in C, awk functions can optionally return a scalar value. Only a function's documentation, or its code, if quite short, can make clear whether the caller should expect a returned value. Functions can be defined anywhere in the program at top level: before, between, or after pattern/action groups. In single-file programs, it is conventional to place all functions after the pattern/action code, and it is usually most convenient to keep them in alphabetical order. awk does not care about these conventions, but people do. A function definition looks like this: function name(arg 1, arg 2, ..., arg n) { statement(s) } The named arguments are used as local variables within the function body, and they hide any global variables of the same name. The function may be used elsewhere in the program by calls of the form: name(expr 1, expr 2, ..., expr n) Ignore any return value result = name(expr 1, expr 2, ..., expr n) Save return value in result The expressions at the point of each call provide initial values for the function-argument variables. The parenthesized argument list must immediately follow the function name, without any intervening whitespace. Changes made to scalar arguments are not visible to the caller, but changes made to arrays are visible. In other words, scalars are passed by value, whereas arrays are passed by reference: the same is true of the C language. A return expression statement in the function body terminates execution of the body, and returns control to the point of the call, with the value of expression. If expression is omitted, then the returned value is implementation-defined. All of the systems that we tested returned either a numeric zero, or an empty string. POSIX does not address the issue of a missing return statement or value. All variables used in the function body that do not occur in the argument list are global. awk permits a function to be called with fewer arguments than declared in the function definition; the extra arguments then serve as local variables. Such variables are commonly needed, so it is conventional to list them in the function argument list, prefixed by some extra whitespace, as shown in Example 9-2. Like all other variables in awk, the extra arguments are initialized to an empty string at function entry. Example 9-2. Searching an array for a value function find_key(array, value, key) { # Search array[ ] for value, and return key such that # array[key] = = value, or return "" if value is not found for (key in array) if (array[key] = = value) return key return "" } Failure to list local variables as extra function arguments leads to hard-to-find bugs when they clash with variables used in calling code. gawk provides the —dump-variables option to help you check