Classic Shell Scripting - Arnold Robbins [208]
+(dave|fred|bob) matches any of the above except the null string.
?(dave|fred|bob) matches the null string dave, fred, or bob.
!(dave|fred|bob) matches anything except dave, fred, or bob.
It is worth emphasizing again that shell regular expressions can still contain standard shell wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent of . (dot) in egrep or awk, and the shell's character set operator [...] is the same as in those utilities.[2] For example, the expression +([[:digit:]]) matches a number: i.e., one or more digits. The shell wildcard character * is equivalent to the shell regular expression *(?). You can even nest the regular expressions: +([[:digit:]]|!([[:upper:]])) matches one or more digits or nonuppercase letters.
Two egrep and awk regexp operators do not have equivalents in the shell. They are:
The beginning- and end-of-line operators ^ and $
The beginning- and end-of-word operators \< and \>
Essentially, the ^ and $ are implied as always being there. Surround a pattern with * characters to disable this. This example illustrates the difference:
$ ls
List files
biff bob frederick shishkabob
$ shopt -s extglob
Enable extended pattern matching (Bash)
$ echo @(dave|fred|bob)
Files that match only dave, fred, or bob
bob
$ echo *@(dave|fred|bob)*
Add wildcard characters
bob frederick shishkabob More files matched
ksh93 supports even more pattern-matching operators. However, since the point of this section is to cover what's common between both bash and ksh93, we stop here. For the details, see Learning the Korn Shell (O'Reilly).
Brace Expansion
Brace expansion is a feature borrowed from the Berkeley C shell, csh. It is supported by both shells. Brace expansion is a way of saving typing when you have strings that are prefixes or suffixes of each other. For example, suppose that you have the following files:
$ ls
cpp-args.c cpp-lex.c cpp-out.c cpp-parse.c
You could type vi cpp-{args,lex,parse}.c if you wished to edit three out of the four C files, and the shell would expand this into vi cpp-args.c cpp-lex.c cpp-parse.c. Furthermore, brace substitutions may be nested. For example:
$ echo cpp-{args,l{e,o}x,parse}.c
cpp-args.c cpp-lex.c cpp-lox.c cpp-parse.c
Process Substitution
Process substitution allows you to open multiple process streams and feed them into a single program for processing. For example:
awk '...' <(generate_data) <(generate_more_data)
(Note that the parentheses are part of the syntax; you type them literally.) Here, generate_data and generate_more_data represent arbitrary commands, including pipelines, that produce streams of data. The awk program processes each stream in turn, not realizing that the data is coming from multiple sources. This is shown graphically in Figure 14-1.
Figure 14-1. Process substitution for both input and output data streams
Process substitution may also be used for output, particularly when combined with the tee program, which sends its input to multiple output files and to standard output. For example:
generate_data | tee >(sort | uniq > sorted_data) \
>(mail -s 'raw data' joe) > raw_data
This command uses tee to (1) send the data to a pipeline that sorts and saves the data, (2) send the data to the mail program for user joe, and (3) redirect the original data into a file. This is represented graphically in Figure 14-1.b. Process substitution, combined with tee, frees you from the straight "one input, one output" paradigm of traditional Unix pipes, letting you split data into multiple output streams, and coalesce multiple input data streams into one.
Process substitution is available only on Unix systems that support the /dev/fd/ n special files for named access to already open file descriptors. Most modern Unix systems, including GNU/Linux, support this feature. As with brace expansion, it is enabled by default