Classic Shell Scripting - Arnold Robbins [56]
Gale, Dorothy ·KNS321·555-0044
...
where · represents an ASCII tab character. We put the personal name in conventional directory order (family name first), padding the name field with spaces to a convenient fixed length. We prefix the office number and telephone with tab characters to preserve some useful structure that other tools can exploit.
Scripting languages, such as awk, were designed to make such tasks easy because they provide automated input processing and splitting of input records into fields, so we could write the conversion job entirely in such a language. However, we want to show how to achieve the same thing with other Unix tools.
For each password file line, we need to extract field five, split it into three subfields, rearrange the names in the first subfield, and then write an office directory line to a sorting process.
awk and cut are convenient tools for field extraction:
... | awk -F: '{ print $5 }' | ...
... | cut -d: -f5 | ...
There is a slight complication in that we have two field-processing tasks that we want to keep separate for simplicity, but we need to combine their output to make a directory entry. The join command is just what we need: it expects two input files, each ordered by a common unique key value, and joins lines sharing a common key into a single output line, with user control over which fields are output.
Since our directory entries contain three fields, to use join we need to create three intermediate files containing the colon-separated pairs key:person, key:office, and key:telephone, one pair per line. These can all be temporary files, since they are derived automatically from the password file.
What key do we use? It just needs to be unique, so it could be the record number in the original password file, but in this case it can also be the username, since we know that usernames are unique in the password file and they make more sense to humans than numbers do. Later, if we decide to augment our directory with additional information, such as job title, we can create another nontemporary file with the pair key:jobtitle and add it to the processing stages.
Instead of hardcoding input and output filenames into our program, it is more flexible to write the program as a filter so that it reads standard input and writes standard output. For commands that are used infrequently, it is advisable to give them descriptive, rather than short and cryptic, names, so we start our shell program like this:
#! /bin/sh
# Filter an input stream formatted like /etc/passwd,
# and output an office directory derived from that data.
#
# Usage:
# passwd-to-directory < /etc/passwd > office-directory-file
# ypcat passwd | passwd-to-directory > office-directory-file
# niscat passwd.org_dir | passwd-to-directory > office-directory-file
Since the password file is publicly readable, any data derived from it is public as well, so there is no real need to restrict access to our program's intermediate files. However, because all of us at times have to deal with sensitive data, it is good to develop the programming habit of allowing file access only to those users or processes that need it. We therefore reset the umask (see Section B.6.1.3 in Appendix B) as the first action in our program:
umask 077 Restrict temporary file access to just us
For accountability and debugging, it is helpful to have some commonality in temporary filenames, and to avoid cluttering the current directory with them: we name them with the prefix /tmp/pd.. To guard against name collisions if multiple instances of our program are running at the same time, we also need the names to be unique: the process number, available in the shell variable $$, provides a distinguishing suffix. (This use of $$ is described in more detail in Chapter 10.) We therefore define these shell variables to represent our temporary files:
PERSON=/tmp/pd.key.person.$$ Unique temporary filenames
OFFICE=/tmp/pd.key.office.$$
TELEPHONE=/tmp/pd.key.telephone.$$
USER=/tmp/pd.key.user.$$
When the job terminates,