Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [133]

By Root 811 0
-v OFS='\t' '{ print $1, $3, $2, $4 }' old > new

awk 'BEGIN { FS = OFS = "\t" } { print $1, $3, $2, $4 }' old > new

awk -F'\t' '{ print $1 "\t" $3 "\t" $2 "\t" $4 }' old > new

To convert column separators from tab (shown here as ·) to ampersand, use either of these:

sed -e 's/·/\&/g' file(s)

awk 'BEGIN { FS = "\t"; OFS = "&" } { $1 = $1; print }' file(s)

Both of these pipelines eliminate duplicate lines from a sorted stream:

sort file(s) | uniq

sort file(s) | awk 'Last != $0 { print } { Last = $0 }'

To convert carriage-return/newline line terminators to newline terminators, use one of these:

sed -e 's/\r$//' file(s)

sed -e 's/^M$//' file(s)

mawk 'BEGIN { RS = "\r\n" } { print }' file(s)

The first sed example needs a modern version that recognizes escape sequences. In the second example, ^M represents a literal Ctrl-M (carriage return) character. For the third example, we need either gawk or mawk because nawk and POSIX awk do not support more than a single character in RS.

To convert single-spaced text lines to double-spaced lines, use any of these:

sed -e 's/$/\n/' file(s)

awk 'BEGIN { ORS = "\n\n" } { print }' file(s)

awk 'BEGIN { ORS = "\n\n" } 1' file(s)

awk '{ print $0 "\n" }' file(s)

awk '{ print; print "" }' file(s)

As before, we need a modern sed version. Notice how a simple change to the output record separator, ORS, in the first awk example solves the problem: the rest of the program just prints each record. The two other awk solutions require more processing for each record, and usually are slower than the first one.

Conversion of double-spaced lines to single spacing is equally easy:

gawk 'BEGIN { RS="\n *\n" } { print }' file(s)

To locate lines in Fortran 77 programs that exceed the 72-character line-length limit,[2] either of these does the job:

egrep -n '^.{73,}' *.f

awk 'length($0) > 72 { print FILENAME ":" FNR ":" $0 }' *.f

We need a POSIX-compliant egrep for the extended regular expression that matches 73 or more of any character.

To extract properly hyphenated International Standard Book Number (ISBN) values from documents, we need a lengthy, but straightforward, regular expression, with the record separator set to match all characters that cannot be part of an ISBN:

gawk 'BEGIN { RS = "[^-0-9Xx]" }

/[0-9][-0-9][-0-9][-0-9][-0-9][-0-9][-0-9][-0-9][-0-9][-0-9][-0-9]-[0-9Xx]/' \

file(s)

With a POSIX-conformant awk, that long regular expression can be shortened to /[0-9][-0-9]{10}-[-0-9Xx]/. Our tests found that gawk --posix, HP/Compaq/DEC OSF/1 awk, Hewlett-Packard HP-UX awk, IBM AIX awk, and Sun Solaris /usr/xpg4/bin/awk are the only ones that support the POSIX extension of braced interval expressions in regular expressions.

To strip angle-bracketed markup tags from HTML documents, treat the tags as record separators, like this:

mawk 'BEGIN { ORS = " "; RS = "<[^<>]*>" } { print }' *.html

By setting ORS to a space, HTML markup gets converted to a space, and all input line breaks are preserved.

Here is how we can extract all of the titles from a collection of XML documents, such as the files for this book, and print them, one title per line, with surrounding markup. This program works correctly even when the titles span multiple lines, and handles the uncommon, but legal, case of spaces between the tag word and the closing angle bracket:

$ mawk -v ORS=' ' -v RS='[ \n]' '//, /<\/title *>/' *.xml |<p>> sed -e 's@ *@&\n@g'

...

Enough awk to Be Dangerous

Freely available awk versions

The awk Command Line

...

The awk program produces a single line of output, so the modern sed filter supplies the needed line breaks. We could eliminate sed here, but to do so, we need some awk statements discussed in the next section.

* * *

[2] The Fortran line-length limit was not a problem in the old days of punched cards, but once screen-based editing became common, it became a source of nasty bugs caused by the compiler's silently ignoring statement text beyond

Return Main Page Previous Page Next Page

®Online Book Reader