Classic Shell Scripting - Arnold Robbins [27]
We expect that you've had some exposure to regular expressions and text matching prior to this book. In that case, these subsections summarize how you can expect to use regular expressions for portable shell scripting.
If you've had no exposure at all to regular expressions, the material here may be a little too condensed for you, and you should detour to a more introductory source, such as Learning the Unix Operating System (O'Reilly) or sed & awk (O'Reilly). Since regular expressions are a fundamental part of the Unix tool-using and tool-building paradigms, any investment you make in learning how to use them, and use them well, will be amply rewarded, multifold, time after time.
If, on the other hand, you've been chopping, slicing, and dicing text with regular expressions for years, you may find our coverage cursory. If such is the case, we recommend that you review the first part, which summarizes POSIX BREs and EREs in tabular form, skip the rest of the section, and move on to a more in-depth source, such as Mastering Regular Expressions (O'Reilly).
What Is a Regular Expression?
Regular expressions are a notation that lets you search for text that fits a particular criterion, such as "starts with the letter a." The notation lets you write a single expression that can select, or match, multiple data strings.
Above and beyond traditional Unix regular expression notation, POSIX regular expressions let you:
Write regular expressions that express locale-specific character sequence orderings and equivalences
Write your regular expressions in a way that does not depend upon the underlying character set of the system
A large number of Unix utilities derive their power from regular expressions of one form or another. A partial list includes the following:
The grep family of tools for finding matching lines of text: grep and egrep, which are always available, as well as the nonstandard but useful agrep utility[1]
The sed stream editor, for making changes to an input stream, described later in the chapter
String processing languages, such as awk, Icon, Perl, Python, Ruby, Tcl, and others
File viewers (sometimes called pagers), such as more, page, and pg, which are common on commercial Unix systems, and the popular less pager[2]
Text editors, such as the venerable ed line editor, the standard vi screen editor, and popular add-on editors such as emacs, jed, jove, vile, vim, and others
Because regular expressions are so central to Unix use, it pays to master them, and the earlier you do so, the better off you'll be.
In terms of the nuts and bolts, regular expressions are built from two basic components: ordinary characters and special characters. An ordinary character is any character that isn't special, as defined in the following table. In some contexts even special characters are treated as ordinary characters. Special characters are often called metacharacters, a term that we use throughout the rest of this chapter. Table 3-1 lists the POSIX BRE and ERE metacharacters.
Table 3-1. POSIX BRE and ERE metacharacters
Character
BRE / ERE
Meaning in a pattern
\
Both
Usually, turn off the special meaning of the following character. Occasionally, enable a special meaning for the following character, such as for \(...\) and \{...\}.
.
Both
Match any single character except NUL. Individual programs may also disallow matching newline.
*
Both
Match any number (or none) of the single character that immediately precedes it. For EREs, the preceding character can instead be a regular expression. For example, since . (dot) means any character, .* means "match any number of any character." For BREs, * is not special if it's the first character of a regular expression.
^
Both
Match the following regular expression at the beginning of the line or string. BRE: special only at the beginning of a regular expression. ERE: special everywhere.
$
Both
Match the