Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [50]

By Root 893 0
make it easy to reformat paragraphs by changing line breaks so that lines do not exceed a width that is comfortable for a human to read; we used such commands a lot in writing this book. Sometimes you need to do this to a data stream in a shell script, or inside an editor that lacks a reformatting command but does have a shell escape. In this case, fmt is what you need. Although POSIX makes no mention of fmt, you can find it on every current flavor of Unix; if you have an older system that lacks fmt, simply install the GNU coreutils package.

Although some implementations of fmt have more options, only two find frequent use: -s means split long lines only, but do not join short lines to make longer ones, and -w n sets the output line width to n characters (default: usually about 75 or so). Here are some examples with chunks of a spelling dictionary that has just one word per line:

$ sed -n -e 9991,10010p /usr/dict/words | fmt

Reformat 20 dictionary words

Graff graft graham grail grain grainy grammar grammarian grammatic

granary grand grandchild grandchildren granddaughter grandeur grandfather

grandiloquent grandiose grandma grandmother

$ sed -n -e 9995,10004p /usr/dict/words | fmt -w 30

Reformat 10 words into short lines

grain grainy grammar

grammarian grammatic

granary grand grandchild

grandchildren granddaughter

If your system does not have /usr/dict/words, then it probably has an equivalent file named /usr/share/dict/words or /usr/share/lib/dict/words.

The split-only option, -s, is helpful in wrapping long lines while leaving short lines intact, and thus minimizing the differences from the original version:

$ fmt -s -w 10 << END_OF_DATA

Reformat long lines only

> one two three four five

> six

> seven

> eight

> END_OF_DATA

one two

three

four five

six

seven

eight

* * *

Warning


You might expect that you could split an input stream into one word per line with fmt -w 0, or remove line breaks entirely with a large width. Unfortunately, fmt implementations vary in behavior:

Older versions of fmt lack the -w option; they use - n to specify an n-character width.

All reject a zero width, but accept -w 1 or -1.

All preserve leading space.

Some preserve lines that look like mail headers.

Some preserve lines beginning with a dot (troff typesetter commands).

Most limit the width. We found peculiar upper bounds of 1021 (Solaris), 2048 (HP/UX 11), 4093 (AIX and IRIX), 8189 (OSF/1 4.0), 12285 (OSF/1 5.1), and 2147483647 (largest 32-bit signed integer: FreeBSD, GNU/Linux, and Mac OS).

The NetBSD and OpenBSD versions of fmt have a different command-line syntax, and apparently allocate a buffer to hold the output line, since they give an out of memory diagnostic for large width values.

IRIX fmt is found in /usr/sbin, a directory that is unlikely to be in your search path.

HP/UX before version 11.0 did not have fmt.

These variations make it difficult to use fmt in portable scripts, or for complex reformatting tasks.

* * *

Counting Lines, Words, and Characters

We have used the word-count utility, wc, a few times before. It is probably one of the oldest, and simplest, tools in the Unix toolbox, and POSIX standardizes it. By default, wc outputs a one-line report of the number of lines, words, and bytes:

$ echo This is a test of the emergency broadcast system | wc

Report counts

1 9 49

Request a subset of those results with the -c (bytes), -l (lines), and -w (words) options:

$ echo Testing one two three | wc -c

Count bytes

22

$ echo Testing one two three | wc -l

Count lines

1

$ echo Testing one two three | wc -w

Count words

4

The -c option originally stood for character count, but with multibyte character-set encodings, such as UTF-8, in modern systems, bytes are no longer synonymous with characters, so POSIX introduced the -m option to count multibyte characters. For 8-bit character data, it is the same as -c.

Although wc is most commonly used with input from a pipeline, it also accepts command-line file arguments, producing a one-line report for each,

Return Main Page Previous Page Next Page

®Online Book Reader