Classic Shell Scripting - Arnold Robbins [142]
function join(array, n, fs, k, s)
{
# Recombine array[1]...array[n] into a string, with elements
# separated by fs
if (n >= 1)
{
s = array[1]
for (k = 2; k <= n; k++)
s = s fs array[k]
}
return (s)
}
String Formatting
The last string functions that we present format numbers and strings under user control: sprintf( format,expression 1,expression 2 ,...) returns the formatted string as its function value. printf( ) works the same way, except that it prints the formatted string on standard output or redirected to a file, instead of returning it as a function value. Newer programming languages replace format control strings with potentially more powerful formatting functions, but at a significant increase in code verbosity. For typical text processing applications, sprintf( ) and printf( ) are nearly always sufficient.
printf( ) and sprintf( ) format strings are similar to those of the shell printf command that we described in Section 7.4. We summarize the awk format items in Table 9-4Table 9-4. These items can each be augmented by the same field width, precision, and flag modifiers discussed in Chapter 7.
The %i, %u, and %X items were not part of the 1987 language redesign, but modern implementations support them. Despite the similarity with the shell printf command, awk's handling of the %c format item differs for integer arguments, and output with %u for negative arguments may disagree because of differences in shell and awk arithmetic.
Table 9-5. printf and sprintf format specifiers
Item
Description
%c
ASCII character. Print the first character of the corresponding string argument, or the character whose number in the host character set is the corresponding integer argument, usually taken modulo 256.
%d, %i
Decimal integer.
%e
Floating-point format ([-]d.precision e[+-]dd).
%f
Floating-point format ([-]ddd.precision).
%g
%e or %f conversion, whichever is shorter, with trailing zeros removed.
%o
Unsigned octal value.
%s
String.
%u
Unsigned value. awk numbers are floating-point values: small negative integer values are output as large positive ones because the sign bit is interpreted as a data bit.
%x
Unsigned hexadecimal number. Letters a-f represent 10 to 15.
%X
Unsigned hexadecimal number. Letters A-F represent 10 to 15.
%%
Literal %.
Most of the format items are straightforward. However, we caution that accurate conversion of binary floating-point values to decimal strings, and the reverse, is a surprisingly difficult problem whose proper solution was only found in about 1990, and can require very high intermediate precision. awk implementations generally use the underlying C library for the conversions required by sprintf( ) format items, and although library quality continues to improve, there are still platforms in which the accuracy of floating-point conversions is deficient. In addition, differences in floating-point hardware and instruction evaluation order mean that floating-point results from almost any programming language vary slightly across different architectures.
When floating-point numbers appear in print statements, awk formats them according to the value of the built-in variable OFMT, which defaults to "%.6g". You can redefine OFMT as needed.
Similarly, when floating-point numbers are converted to strings by concatenation, awk formats them according to the value of another built-in variable, CONVFMT.[4] Its default value is also "%.6g".
The test program in Example 9-8 produces output like this with a recent nawk version on a Sun Solaris SPARC system:
$ nawk -f ofmt.awk
[ 1] OFMT = "%.6g" 123.457
[ 2] OFMT = "%d" 123
[ 3] OFMT = "%e" 1.234568e+02
[ 4] OFMT = "%f" 123.456789
[ 5] OFMT = "%g" 123.457
[ 6] OFMT = "%25.16e" 1.2345678901234568e+02
[ 7] OFMT = "%25.16f" 123.4567890123456806
[ 8] OFMT = "%25.16g" 123.4567890123457
[ 9] OFMT = "%25d" 123
[10] OFMT = "%.25d" 0000000000000000000000123
[11] OFMT = "%25d" 2147483647
[12] OFMT = "%25d" 2147483647