Classic Shell Scripting - Arnold Robbins [33]
Matching multiple regular expressions with one expression
EREs have the most notable differences from BREs in the area of matching multiple characters. The * does work the same as in BREs.[5]
Interval expressions are also available in EREs; however, they are written using plain braces, not braces preceded by backslashes. Thus, our previous examples of "exactly five occurrences of a" and "between 10 and 42 instances of q" are written a{5} and q{10,42}, respectively. Use \{ and \} to match literal brace characters. POSIX purposely leaves the meaning of a { without a matching } in an ERE as "undefined."
EREs have two additional metacharacters for finer-grained matching control, as follows:
?
Match zero or one of the preceding regular expression
+
Match one or more of the preceding regular expression
You can think of the ? character as meaning "optional." In other words, text matching the preceding regular expression is either present or it's not. For example, ab?c matches both ac and abc, but nothing else. (Compare this to ab*c, which can match any number of intermediate b characters.)
The + character is conceptually similar to the * metacharacter, except that at least one occurrence of text matching the preceding regular expression must be present. Thus, ab+c matches abc, abbc, abbbc, and so on, but does not match ac. You can always replace a regular expression of the form ab+c with abb*c; however, the + can save a lot of typing (and the potential for typos!) when the preceding regular expression is complicated.
Alternation
Bracket expressions let you easily say "match this character, or that character, or ...." However, they don't let you specify "match this sequence, or that sequence, or ...." You can do this using the alternation operator, which is the vertical bar or pipe character (|). Simply write the two sequences of characters, separated by a pipe. For example, read|write matches both read and write, fast|slow matches both fast and slow, and so on. You may use more than one: sleep|doze|dream|nod off|slumber matches all five expressions.
The | character has the lowest precedence of all the ERE operators. Thus, the lefthand side extends all the way to the left of the operator, to either a preceding | character or the beginning of the regular expression. Similarly, the righthand side of the | extends all the way to the right of the operator, to either a succeeding | character or the end of the whole regular expression. The implications of this are discussed in the next section.
Grouping
You may have noticed that for EREs, we've stated that the operators are applied to "the preceding regular expression." The reason is that parentheses ((...)) provide grouping, to which the operators may then be applied. For example, (why)+ matches one or more occurrences of the word why.
Grouping is particularly valuable (and necessary) when using alternation. It allows you to build complicated and flexible regular expressions. For example, [Tt]he (CPU|computer) is matches sentences using either CPU or computer in between The (or the) and is. Note that here the parentheses are metacharacters, not input text to be matched.
Grouping is also often necessary when using a repetition operator together with alternation. read|write+ matches exactly one occurrence of the word read or an occurrence of the word write, followed by any number of e characters (writee, writeee, and so on). A more useful pattern (and probably what would be meant) is (read|write)+, which matches one or more occurrences of either of the words read or write.
Of course, (read|write)+ makes no allowance for intervening whitespace between words. ((read|white)[[:space:]]*)+ is a more complicated, but more realistic, regular expression. At first glance, this looks rather opaque. However, if you break it down into its component parts, from the outside in, it's not too