Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [130]

By Root 924 0
by newlines or whatever FS is set to.

gawk and mawk provide an important extension: RS may be a regular expression, provided that it is longer than a single character. Thus, RS = "+" matches a literal plus, whereas RS = ":+" matches one or more colons. This provides much more powerful record specification, which we exploit in some of the examples in Section 9.6.

With a regular expression record separator, the text that matches the separator can no longer be determined from the value of RS. gawk provides it as a language extension in the built-in variable RT, but mawk does not.

Without the extension of RS to regular expressions, it can be hard to simulate regular expressions as record separators, if they can match across line boundaries, because most Unix text processing tools deal with a line at a time. Sometimes, you can use tr to convert newline into an otherwise unused character, making the data stream one giant line. However, that often runs afoul of buffer-size limits in other tools. gawk, mawk, and emacs are unusual in freeing you from the limiting view of line-oriented data.

Field Separators

Fields are separated from each other by strings that match the current value of the field-separator regular expression, available in the built-in variable FS.

The default value of FS, a single space, receives special interpretation: it means one or more whitespace characters (space or tab), and leading and trailing whitespace on the line is ignored. Thus, the input lines:

alpha beta gamma

alpha beta gamma

both look the same to an awk program with the default setting of FS: three fields with values "alpha", "beta", and "gamma". This is particularly convenient for input prepared by humans.

For those rare occasions when a single space separates fields, simply set FS = "[ ]" to match exactly one space. With that setting, leading and trailing whitespace is no longer ignored. These two examples report different numbers of fields (two spaces begin and end the input record):

$ echo ' un deux trois ' | awk -F' ' '{ print NF ":" $0 }'

3: un deux trois

$ echo ' un deux trois ' | awk -F'[ ]' '{ print NF ":" $0 }'

7: un deux trois

The second example sees seven fields: "", "", "un", "deux", "trois", "", and "".

FS is treated as a regular expression only when it contains more than one character. FS = "." uses a period as the field separator; it is not a regular expression that matches any single character.

Modern awk implementations also permit FS to be an empty string. Each character is then a separate field, but in older implementations, each record then has only one field. POSIX says only that the behavior for an empty field separator is unspecified.

Fields

Fields are available to the awk program as the special names $1, $2, $3, ..., $NF. Field references need not be constant, and they are converted (by truncation) to integer values if necessary: assuming that k is 3, the values $k, $(1+2), $(27/9), $3.14159, $"3.14159", and $3 all refer to the third field.

The special field name $0 refers to the current record, initially exactly as read from the input stream, and the record separator is not part of the record. References to field numbers above the range 0 to NF are not erroneous: they return empty strings and do not create new fields, unless you assign them a value. References to fractional, or non-numeric, field numbers are implementation-defined. References to negative field numbers are fatal errors in all implementations that we tested. POSIX says only that references to anything other than non-negative integer field numbers are unspecified.

Fields can be assigned too, just like normal variables. For example, $1 = "alef" is legal, but has an important side effect: if the complete record is subsequently referenced, it is reassembled from the current values of the fields, but separated by the string given by the output-field-separator built-in variable, OFS, which defaults to a single space.

Patterns and Actions

Patterns and actions form the heart of awk programming. It is awk's unconventional

Return Main Page Previous Page Next Page

®Online Book Reader