Classic Shell Scripting - Arnold Robbins [132]
$ echo 'one two three four' | awk '{ print $1, $2, $3 }'
one two three
$ echo 'one two three four' | awk '{ OFS = "..."; print $1, $2, $3 }'
one...two...three
$ echo 'one two three four' | awk '{ OFS = "\n"; print $1, $2, $3 }'
one
two
three
Changing the output field separator without assigning any field does not alter $0:
$ echo 'one two three four' | awk '{ OFS = "\n"; print $0 }'
one two three four
However, if we change the output field separator, and we assign at least one of the fields (even if we do not change its value), then we force reassembly of the record with the new field separator:
$ echo 'one two three four' | awk '{ OFS = "\n"; $1 = $1; print $0 }'
one
two
three
four
One-Line Programs in awk
We have now covered enough awk to do useful things with as little as one line of code; few other programming languages can do so much with so little. In this section, we present some examples of these one-liners, although page-width limitations sometimes force us to wrap them onto more than one line. In some of the examples, we show multiple ways to program a solution in awk, or with other Unix tools:
We start with a simple implementation in awk of the Unix word-count utility, wc:
awk '{ C += length($0) + 1; W += NF } END { print NR, W, C }'
Notice that pattern/action groups need not be separated by newlines, even though we usually do that for readability. Although we could have included an initialization block of the form BEGIN { C = W = 0 }, awk's guaranteed default initializations make it unnecessary. The character count in C is updated at each record to count the record length, plus the newline that is the default record separator. The word count in W accumulates the number of fields. We do not need to keep a line-count variable because the built-in record count, NR, automatically tracks that information for us. The END action handles the printing of the one-line report that wc produces.
awk exits immediately without reading any input if its program is empty, so it can match cat as an efficient data sink:
$ time cat *.xml > /dev/null
0.035u 0.121s 0:00.21 71.4% 0+0k 0+0io 99pf+0w
$ time awk '' *.xml
0.136u 0.051s 0:00.21 85.7% 0+0k 0+0io 140pf+0w
Apart from issues with NUL characters, awk can easily emulate cat—these two examples produce identical output:
cat *.xml
awk 1 *.xml
To print original data values and their logarithms for one-column datafiles, use this:
awk '{ print $1, log($1) }' file(s)
To print a random sample of about 5 percent of the lines from text files, use the pseudorandom-number generator function (see Section 9.10), which produces a result uniformly distributed between zero and one:
awk 'rand( ) < 0.05' file(s)
Reporting the sum of the n-th column in tables with whitespace-separated columns is easy:
awk -v COLUMN=n '{ sum += $COLUMN } END { print sum }' file(s)
A minor tweak instead reports the average of column n:
awk -v COLUMN=n '{ sum += $COLUMN } END { print sum / NR }' file(s)
To print the running total for expense files whose records contain a description and an amount in the last field, use the built-in variable NF in the computation of the total:
awk '{ sum += $NF; print $0, sum }' file(s)
Here are three ways to search for text in files:
egrep 'pattern|pattern' file(s)
awk '/pattern|pattern/' file(s)
awk '/pattern|pattern/ { print FILENAME ":" FNR ":" $0 }' file(s)
If you want to restrict the search to just lines 100-150, you can use two tools and a pipeline, albeit with loss of location information:
sed -n -e 100,150p -s file(s) | egrep 'pattern'
We need GNU sed here for its -s option, which restarts line numbering for each file. Alternatively, you can use awk with a fancier pattern:
awk '(100 <= FNR) && (FNR <= 150) && /pattern/ \
{ print FILENAME ":" FNR ":" $0 }' file(s)
To swap the second and third columns in a four-column table, assuming tab separators, use any of these:
awk -F'\t'