Classic Shell Scripting - Arnold Robbins [50]
Although some implementations of fmt have more options, only two find frequent use: -s means split long lines only, but do not join short lines to make longer ones, and -w n sets the output line width to n characters (default: usually about 75 or so). Here are some examples with chunks of a spelling dictionary that has just one word per line:
$ sed -n -e 9991,10010p /usr/dict/words | fmt
Reformat 20 dictionary words
Graff graft graham grail grain grainy grammar grammarian grammatic
granary grand grandchild grandchildren granddaughter grandeur grandfather
grandiloquent grandiose grandma grandmother
$ sed -n -e 9995,10004p /usr/dict/words | fmt -w 30
Reformat 10 words into short lines
grain grainy grammar
grammarian grammatic
granary grand grandchild
grandchildren granddaughter
If your system does not have /usr/dict/words, then it probably has an equivalent file named /usr/share/dict/words or /usr/share/lib/dict/words.
The split-only option, -s, is helpful in wrapping long lines while leaving short lines intact, and thus minimizing the differences from the original version:
$ fmt -s -w 10 << END_OF_DATA
Reformat long lines only
> one two three four five
> six
> seven
> eight
> END_OF_DATA
one two
three
four five
six
seven
eight
* * *
Warning
You might expect that you could split an input stream into one word per line with fmt -w 0, or remove line breaks entirely with a large width. Unfortunately, fmt implementations vary in behavior:
Older versions of fmt lack the -w option; they use - n to specify an n-character width.
All reject a zero width, but accept -w 1 or -1.
All preserve leading space.
Some preserve lines that look like mail headers.
Some preserve lines beginning with a dot (troff typesetter commands).
Most limit the width. We found peculiar upper bounds of 1021 (Solaris), 2048 (HP/UX 11), 4093 (AIX and IRIX), 8189 (OSF/1 4.0), 12285 (OSF/1 5.1), and 2147483647 (largest 32-bit signed integer: FreeBSD, GNU/Linux, and Mac OS).
The NetBSD and OpenBSD versions of fmt have a different command-line syntax, and apparently allocate a buffer to hold the output line, since they give an out of memory diagnostic for large width values.
IRIX fmt is found in /usr/sbin, a directory that is unlikely to be in your search path.
HP/UX before version 11.0 did not have fmt.
These variations make it difficult to use fmt in portable scripts, or for complex reformatting tasks.
* * *
Counting Lines, Words, and Characters
We have used the word-count utility, wc, a few times before. It is probably one of the oldest, and simplest, tools in the Unix toolbox, and POSIX standardizes it. By default, wc outputs a one-line report of the number of lines, words, and bytes:
$ echo This is a test of the emergency broadcast system | wc
Report counts
1 9 49
Request a subset of those results with the -c (bytes), -l (lines), and -w (words) options:
$ echo Testing one two three | wc -c
Count bytes
22
$ echo Testing one two three | wc -l
Count lines
1
$ echo Testing one two three | wc -w
Count words
4
The -c option originally stood for character count, but with multibyte character-set encodings, such as UTF-8, in modern systems, bytes are no longer synonymous with characters, so POSIX introduced the -m option to count multibyte characters. For 8-bit character data, it is the same as -c.
Although wc is most commonly used with input from a pipeline, it also accepts command-line file arguments, producing a one-line report for each,