Classic Shell Scripting - Arnold Robbins [158]
FILES.last02 FILES.last01 DIRECTORIES.all \
DIRECTORIES.last31 DIRECTORIES.last14 \
DIRECTORIES.last07 DIRECTORIES.last02 DIRECTORIES.last01
do
sed -e "s=^[.]/=$WD/=" -e "s=^[.]$=$WD=" $TMP/$i.$$ |
LC_ALL=C sort > $TMP/$i.$$.tmp
cmp -s $TMP/$i.$$.tmp $i || mv $TMP/$i.$$.tmp $i
done
Finding Problem Files
In Section 10.1, we noted the difficulties presented by filenames containing special characters, such as newline. GNU find has the -print0 option to display filenames as NUL-terminated strings. Since pathnames can legally contain any character except NUL, this option provides a way to produce lists of filenames that can be parsed unambiguously.
It is hard to parse such lists with typical Unix tools, most of which assume line-oriented text input. However, in a compiled language with byte-at-a-time input, such as C, C++, or Java, it is straightforward to write a program to diagnose the presence of problematic filenames in your filesystem. Sometimes they get there by simple programmer error, but other times, they are put there by attackers who try to hide their presence by disguising filenames.
For example, suppose that you did a directory listing and got output like this:
$ ls
List directory
. ..
At first glance, this seems innocuous, since we know that empty directories always contain two special hidden dotted files for the current and parent directory. However, notice that we did not use the -a option, so we should not have seen any hidden files, and also, there appears to be a space before the first dot in the output. Something is just not right! Let's apply find and od to investigate further:
$ find -print0 | od -ab
Convert NUL-terminated filenames to octal and ASCII
0000000 . nul . / sp . nul . / sp . . nul . / .
056 000 056 057 040 056 000 056 057 040 056 056 000 056 057 056
0000020 nl nul . / . . sp . . sp . . sp . sp nl
012 000 056 057 056 056 040 056 056 040 056 056 040 056 040 012
0000040 nl nl sp sp nul
012 012 040 040 000
0000045
We can make this somewhat more readable with the help of tr, turning spaces into S, newlines into N, and NULs into newline:
$ find -print0 | tr ' \n\0' 'SN\n'
Make problem characters visible as S and N
.
./S.
./S..
./.N
./..S..S..S.SNNNSS
Now we can see what is going on: we have the normal dot directory, then a file named space-dot, another named space-dot-dot, yet another named dot-newline, and finally one named dot-dot-space-dot-dot-space-dot-dot-space-dot-space-newline-newline-newline-space-space. Unless someone was practicing Morse code in your filesystem, these files look awfully suspicious, and you should investigate them further before you get rid of them.
* * *
[5] Available at ftp://ftp.gnu.org/gnu/findutils/.
[6] Available at ftp://ftp.geekreview.org/slocate/.
[7] Since users are so used to seeing sorted lists from ls and shell wildcard expansions, many assume that directories must store names in sorted order. That is not the case, but it is usually not until you write a program that uses the opendir( ), readdir( ), and closedir( ) library calls that you discover the probable need for qsort( ) as well!
[8] Our thanks go to Pieter J. Bowman at the University of Utah for this example.
Running Commands: xargs
When find produces a list of files, it is often useful to be able to supply that list as arguments to another command. Normally, this is done with the shell's command substitution feature, as in this example of searching for the symbol POSIX_OPEN_MAX in system header files:
$ grep POSIX_OPEN_MAX /dev/null $(find /usr/include -type f | sort)
/usr/include/limits.h:#define _POSIX_OPEN_MAX 16
Whenever you write a program or a command that deals with a list of objects, you should make sure that it behaves properly if the list is empty. Because grep reads standard input when it is given no file arguments, we supplied an argument of /dev/null to ensure that it does not hang waiting for terminal input if find produces no output: that will not happen here, but it is good to develop