Classic Shell Scripting - Arnold Robbins [34]
Figure 3-1. Reading a complicated regular expression
The upshot is that this single regular expression matches multiple successive occurrences of either read or write, possibly separated by whitespace characters.
The use of a * after the [[:space:]] is something of a judgment call. By using a * and not a +, the match gets words at the end of a line (or string). However, this opens up the possibility of matching words with no intervening whitespace at all. Crafting regular expressions often requires such judgment calls. How you build your regular expressions will depend on both your input data and what you need to do with that data.
Finally, grouping is helpful when using alternation together with the ^ and $ anchor characters. Because | has the lowest precedence of all the operators, the regular expression ^abcd|efgh$ means "match abcd at the beginning of the string, or match efgh at the end of the string." This is different from ^(abcd|efgh)$, which means "match a string containing exactly abcd or exactly efgh."
Anchoring text matches
The ^ and $ have the same meaning as in BREs: anchor the regular expression to the beginning or end of the text string (or line). There is one significant difference, though. In EREs, ^ and $ are always metacharacters. Thus, regular expressions such as ab^cd and ef$gh are valid, but cannot match anything, since the text preceding the ^ and the text following the $ prevent them from matching "the beginning of the string" and "the end of the string," respectively. As with the other metacharacters, they do lose their special meaning inside bracket expressions.
ERE operator precedence
Operator precedence applies to EREs as it does to BREs. Table 3-6 provides the precedence for the ERE operators, from highest to lowest.
Table 3-6. ERE operator precedence from highest to lowest
Operator
Meaning
[. .] [= =] [: :]
Bracket symbols for character collation
\ metacharacter
Escaped metacharacters
[ ]
Bracket expressions
( )
Grouping
* + ? { }
Repetition of the preceding regular expression
no symbol
Concatenation
^ $
Anchors
|
Alternation
Regular Expression Extensions
Many programs provide extensions to regular expression syntax. Typically, such extensions take the form of a backslash followed by an additional character, to create new operators. This is similar to the use of a backslash in \(...\) and \{...\} in POSIX BREs.
The most common extensions are the operators \< and \>, which match the beginning and end of a "word," respectively. Words are made up of letters, digits, and underscores. We call such characters word-constituent.
The beginning of a word occurs at either the beginning of a line or the first word-constituent character following a nonword-constituent character. Similarly, the end of a word occurs at the end of a line, or after the last word-constituent character before a nonword-constituent one.
In practice, word matching is intuitive and straightforward. The regular expression \ Although standardized by POSIX only for the ex editor, word matching is universally supported by the ed, ex, and vi editors that come standard with every commercial Unix system. Word matching is also supported on the "clone" versions of these programs that come with GNU/Linux and BSD systems, as well as in emacs, vim, and vile. Most GNU utilities support it as well. Additional Unix programs that support word matching often include grep and sed, but you should double-check the manpages for the commands on your system. GNU versions of the standard utilities that deal with regular expressions typically support a number of additional operators. These operators are outlined in Table 3-7. Table 3-7. Additional GNU regular expression operators Operator