Classic Shell Scripting - Arnold Robbins [32]
$ getconf RE_DUP_MAX
32767
Anchoring text matches
Two additional metacharacters round out our discussion of BREs. These are the caret (^) and the dollar sign ($). These characters are called anchors because they restrict the regular expression to matching at the beginning or end, respectively, of the string being matched against. (This use of ^ is entirely separate from the use of ^ to complement the list of characters inside a bracket expression.) Assuming that the text to be matched is abcABCdefDEF, Table 3-4 provides some examples:
Table 3-4. Examples of anchors in regular expressions
Pattern
Matches?
Text matched (in bold) / Reason match fails
ABC
Yes
Characters 4, 5, and 6, in the middle: abc ABC defDEF
^ABC
No
Match is restricted to beginning of string
def
Yes
Characters 7, 8, and 9, in the middle: abcABC def DEF
def$
No
Match is restricted to end of string
[[:upper:]]\{3\}
Yes
Characters 4, 5, and 6, in the middle: abc ABC defDEF
[[:upper:]]\{3\}$
Yes
Characters 10, 11, and 12, at the end: abcDEFdef DEF
^[[:alpha:]]\{3\}
Yes
Characters 1, 2, and 3, at the beginning: abc ABCdefDEF
^ and $ may be used together, in which case the enclosed regular expression must match the entire string (or line). It is also useful occasionally to use the simple regular expression ^$, which matches empty strings or lines. Together with the -v option to grep, which prints all lines that don't match a pattern, these can be used to filter out empty lines from a file.
For example, it's sometimes useful to look at C source code after it has been processed for #include files and #define macros so that you can see exactly what the C compiler sees. (This is low-level debugging, but sometimes it's what you have to do.) Expanded files often contain many more blank or empty lines than lines of source text: thus it's useful to exclude empty lines:
$ cc -E foo.c | grep -v '^$' > foo.out
Preprocess, remove empty lines
^ and $ are special only at the beginning or end of a BRE, respectively. In a BRE such as ab^cd, the ^ stands for itself. So too in ef$gh, the $ in this case stands for itself. And, as with any other metacharacter, \^ and \$ may be used, as may [$].[3]
BRE operator precedence
As in mathematical expressions, the regular expression operators have a certain defined precedence. This means that certain operators are applied before (have higher precedence than) other operators. Table 3-5 provides the precedence for the BRE operators, from highest to lowest.
Table 3-5. BRE operator precedence from highest to lowest
Operator
Meaning
[. .] [= =] [: :]
Bracket symbols for character collation
\ metacharacter
Escaped metacharacters
[ ]
Bracket expressions
\( \) \ digit
Subexpressions and backreferences
* \{ \}
Repetition of the preceding single-character regular expression
no symbol
Concatenation
^ $
Anchors
Extended Regular Expressions
EREs, as the name implies, have more capabilities than do basic regular expressions. Many of the metacharacters and capabilities are identical. However, some of the metacharacters that look similar to their BRE counterparts have different meanings.
Matching single characters
When it comes to matching single characters, EREs are essentially the same as BREs. In particular, normal characters, the backslash character for escaping metacharacters, and bracket expressions all behave as described earlier for BREs.
One notable exception is that in awk, \ is special inside bracket expressions. Thus, to match a left bracket, dash, right bracket, or backslash, you could use [\[\-\]\]. Again, this reflects historical practice.
Backreferences don't exist
Backreferences don't exist in EREs.[4] Parentheses are