Classic Shell Scripting - Arnold Robbins [58]
This command uses a colon to separate fields, sorting on fields 1, 2, and 3, in order. The results of this operation, which become the input to the next stage, look like this:Franklin, Ben:OSD212:555-0022
Gale, Dorothy:KNS321:555-0044
...
Finally, reformat the output, using awk's printf statement to separate each field with tab characters. The command to do this is:... | awk -F: '{ printf("%-39s\t%s\t%s\n", $1, $2, $3) }'
For flexibility and ease of maintenance, formatting should always be left until the end. Up to that point, everything is just text strings of arbitrary length.
Here's the complete pipeline:
join -t: $PERSON $OFFICE |
join -t: - $TELEPHONE |
cut -d: -f 2- |
sort -t: -k1,1 -k2,2 -k3,3 |
awk -F: '{ printf("%-39s\t%s\t%s\n", $1, $2, $3) }'
The awk printf statement used here is similar enough to the shell printf command that its meaning should be clear: print the first colon-separated field left-adjusted in a 39-character field, followed by a tab, the second field, another tab, and the third field. Here are the full results:
Franklin, Ben ·OSD212·555-0022
Gale, Dorothy ·KNS321·555-0044
Gale, Toto ·KNS322·555-0045
Hancock, John ·SIG435·555-0099
Jefferson, Thomas ·BMD19·555-0095
Jones, Adrian W. ·OSD211·555-0123
Ross, Betsy ·BMD17·555-0033
Washington, George ·BST999·555-0001
That is all there is to it! Our entire script is slightly more than 20 lines long, excluding comments, with five main processing steps. We collect it together in one place in Example 5-1.
Example 5-1. Creating an office directory
#! /bin/sh
# Filter an input stream formatted like /etc/passwd,
# and output an office directory derived from that data.
#
# Usage:
# passwd-to-directory < /etc/passwd > office-directory-file
# ypcat passwd | passwd-to-directory > office-directory-file
# niscat passwd.org_dir | passwd-to-directory > office-directory-file
umask 077
PERSON=/tmp/pd.key.person.$$
OFFICE=/tmp/pd.key.office.$$
TELEPHONE=/tmp/pd.key.telephone.$$
USER=/tmp/pd.key.user.$$
trap "exit 1" HUP INT PIPE QUIT TERM
trap "rm -f $PERSON $OFFICE $TELEPHONE $USER" EXIT
awk -F: '{ print $1 ":" $5 }' > $USER
sed -e 's=/.*= =' \
-e 's=^\([^:]*\):\(.*\) \([^ ]*\)=\1:\3, \2=' < $USER | sort > $PERSON
sed -e 's=^\([^:]*\):[^/]*/\([^/]*\)/.*$=\1:\2=' < $USER | sort > $OFFICE
sed -e 's=^\([^:]*\):[^/]*/[^/]*/\([^/]*\)=\1:\2=' < $USER | sort > $TELEPHONE
join -t: $PERSON $OFFICE |
join -t: - $TELEPHONE |
cut -d: -f 2- |
sort -t: -k1,1 -k2,2 -k3,3 |
awk -F: '{ printf("%-39s\t%s\t%s\n", $1, $2, $3) }'
The real power of shell scripting shows itself when we want to modify the script to do a slightly different job, such as insertion of the job title from a separately maintained key:jobtitle file. All that we need to do is modify the final pipeline to look something like this:
join -t: $PERSON /etc/passwd.job-title | Extra join with job title
join -t: - $OFFICE |
join -t: - $TELEPHONE |
cut -d: -f 2- |
sort -t: -k1,1 -k3,3 -k4,4 | Modify sort command
awk -F: '{ printf("%-39s\t%-23s\t%s\t%s\n",
$1, $2, $3, $4) }' And formatting command
The total cost for the extra directory field is one more join, a change in the sort fields, and a small tweak in the final awk formatting command.
Because we were careful to preserve special field delimiters in our output, we can trivially prepare useful alternative directories like this:
passwd-to-directory < /etc/passwd | sort -t'·' -k2,2 > dir.by-office
passwd-to-directory < /etc/passwd | sort -t'·' -k3,3 > dir.by-telephone
As usual, · represents an ASCII tab character.
A critical assumption of our program is that there is a unique key for each data record. With that unique key, separate views of the data can be maintained in files as key:value pairs. Here, the key was a Unix username, but in larger contexts, it could be a book number (ISBN), credit card number, employee number, national retirement system number, part number, student number,