Classic Shell Scripting - Arnold Robbins [20]
Append to a file with >>
Use program >> file to send program's standard output to the end of file.
Like >, the >> operator creates the destination file if it doesn't exist. However, if it already exists, instead of truncating the file, any new data generated by the running program is appended to the end of the file:
for f in dos-file*.txt
do
tr -d '\r' < $f >> big-unix-file.txt
done
(The for loop is described in Section 6.4.)
Create pipelines with |
Use program1 | program2 to make the standard output of program1 become the standard input of program2.
Although < and > connect input and output to files, a pipeline hooks together two or more running programs. The standard output of the first program becomes the standard input of the second one. In favorable cases, pipelines can run as much as ten times faster than similar code using temporary files. Most of this book is about learning how to hook together the various tools into pipelines of increasing complexity and power. For example:
tr -d '\r' < dos-file.txt | sort > unix-file.txt
This pipeline removes carriage-return characters from the input file, and then sorts the data, sending the resulting output to the destination file.
* * *
tr
Usage
tr [ options ] source-char-list replace-char-list
Purpose
To transliterate characters. For example, converting uppercase characters to lowercase. Options let you remove characters and compress runs of identical characters.
Major options
-c
Complement the values in source-char-list. The characters that tr translates then become those that are not in source-char-list. This option is usually used with one of -d or -s.
-C
Like -c but work on (possibly multibyte) characters, not binary byte values. See Caveats.
-d
Delete characters in source-char-list from the input instead of transliterating them.
-s
"Squeeze out" duplicate characters. Each sequence of repeated characters listed in source-char-list is replaced with a single instance of that character.
Behavior
Acts as a filter, reading characters from standard input and writing them to standard output. Each input character in source-char-list is replaced with the corresponding character in replace-char-list. POSIX-style character and equivalence classes may be used, and tr also supports a notation for repeated characters in replace-char-list. See the manual pages for tr(1) for the details on your system.
Caveats
According to POSIX, the -c option operates on the binary byte values, whereas -C operates on characters as specified by the current locale. As of early 2005, many systems don't yet support the -C option.
* * *
When working with the Unix tools, it helps to visualize data as being similar to water in a pipeline. Untreated water goes into a water-processing plant and passes through a variety of filters, until the final output is water fit for human consumption.
Similarly, when scripting, you often have raw data in some defined input format, and you need processed data as the result. (Processing may mean any number of things: sorting, summing and averaging, formatting for printing, etc.) You start with the original data, and then construct a pipeline, step by step, where each stage in the pipeline further refines the data.
If you're new to Unix, it may help your visualization if you look at < and > as data "funnels"—data goes into the big end and comes out the small end.
* * *
Tip
A final tip: when constructing pipelines, try to write them so that the amount of data is reduced at each stage. In other words, if you have two steps that could be done in either order relative to each other, put the one that will reduce the amount of data first in the pipeline. This improves the overall efficiency of your script, since Unix will have to move less data between programs, and each program in turn will have less work to do.
For example, use grep to choose interesting lines before using sort to