Classic Shell Scripting - Arnold Robbins [28]
[...]
Both
Termed a bracket expression, this matches any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. (Caution: ranges are locale-sensitive, and thus not portable.) A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally). Bracket expressions may contain collating symbols, equivalence classes, and character classes (described shortly).
\{ n,m \}
ERE
Termed an interval expression, this matches a range of occurrences of the single character that immediately precedes it. \{ n \} matches exactly n occurrences, \{ n ,\} matches at least n occurrences, and \{ n,m \} matches any number of occurrences between n and m. n and m must be between 0 and RE_DUP_MAX (minimum value: 255), inclusive.
\( \)
BRE
Save the pattern enclosed between \( and \) in a special holding space. Up to nine subpatterns can be saved on a single pattern. The text matched by the subpatterns can be reused later in the same pattern, by the escape sequences \1 to \9. For example, \(ab\).*\1 matches two occurrences of ab, with any number of characters in between.
\ n
BRE
Replay the nth subpattern enclosed in \( and \) into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left.
{ n,m }
ERE
Just like the BRE \{ n,m \} earlier, but without the backslashes in front of the braces.
+
ERE
Match one or more instances of the preceding regular expression.
?
ERE
Match zero or one instances of the preceding regular expression.
|
ERE
Match the regular expression specified before or after.
( )
ERE
Apply a match to the enclosed group of regular expressions.
Table 3-2 presents some simple examples.
Table 3-2. Simple regular expression matching examples
Expression
Matches
tolstoy
The seven letters tolstoy, anywhere on a line
^tolstoy
The seven letters tolstoy, at the beginning of a line
tolstoy$
The seven letters tolstoy, at the end of a line
^tolstoy$
A line containing exactly the seven letters tolstoy, and nothing else
[Tt]olstoy
Either the seven letters Tolstoy, or the seven letters tolstoy, anywhere on a line
tol.toy
The three letters tol, any character, and the three letters toy, anywhere on a line
tol.*toy
The three letters tol, any sequence of zero or more characters, and the three letters toy, anywhere on a line (e.g., toltoy, tolstoy, tolWHOtoy, and so on)
POSIX bracket expressions
In order to accommodate non-English environments, the POSIX standard enhanced the ability of character set ranges (e.g., [a-z]) to match characters not in the English alphabet. For example, the French è is an alphabetic character, but the typical character class [a-z] would not match it. Additionally, the standard provides for sequences of characters that should be treated as a single unit when matching and collating (sorting) string data. (For example, there are locales where the two characters ch are treated as a unit, and must be matched and sorted that way.) The growing popularity of the Unicode character set standard adds further complications to the use of simple ranges, making them even less appropriate for modern applications.
POSIX also changed what had been common terminology. What we saw earlier as a range expression is often called a "character class" in the Unix literature. It is now called a bracket expression in the POSIX standard. Within "bracket expressions," besides literal characters such as z, ;, and so on, you can have additional components. These are:
Character classes
A POSIX character class consists of keywords bracketed by [: and :]. The keywords describe