Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [32]

By Root 858 0
RE_DUP_MAX is a symbolic constant defined by POSIX and available via the getconf command. The minimum value for RE_DUP_MAX is 255; some systems allow larger values. On one of our GNU/Linux systems, it's quite large:

$ getconf RE_DUP_MAX

32767

Anchoring text matches

Two additional metacharacters round out our discussion of BREs. These are the caret (^) and the dollar sign ($). These characters are called anchors because they restrict the regular expression to matching at the beginning or end, respectively, of the string being matched against. (This use of ^ is entirely separate from the use of ^ to complement the list of characters inside a bracket expression.) Assuming that the text to be matched is abcABCdefDEF, Table 3-4 provides some examples:

Table 3-4. Examples of anchors in regular expressions

Pattern

Matches?

Text matched (in bold) / Reason match fails

ABC

Yes

Characters 4, 5, and 6, in the middle: abc ABC defDEF

^ABC

No

Match is restricted to beginning of string

def

Yes

Characters 7, 8, and 9, in the middle: abcABC def DEF

def$

No

Match is restricted to end of string

[[:upper:]]\{3\}

Yes

Characters 4, 5, and 6, in the middle: abc ABC defDEF

[[:upper:]]\{3\}$

Yes

Characters 10, 11, and 12, at the end: abcDEFdef DEF

^[[:alpha:]]\{3\}

Yes

Characters 1, 2, and 3, at the beginning: abc ABCdefDEF

^ and $ may be used together, in which case the enclosed regular expression must match the entire string (or line). It is also useful occasionally to use the simple regular expression ^$, which matches empty strings or lines. Together with the -v option to grep, which prints all lines that don't match a pattern, these can be used to filter out empty lines from a file.

For example, it's sometimes useful to look at C source code after it has been processed for #include files and #define macros so that you can see exactly what the C compiler sees. (This is low-level debugging, but sometimes it's what you have to do.) Expanded files often contain many more blank or empty lines than lines of source text: thus it's useful to exclude empty lines:

$ cc -E foo.c | grep -v '^$' > foo.out

Preprocess, remove empty lines

^ and $ are special only at the beginning or end of a BRE, respectively. In a BRE such as ab^cd, the ^ stands for itself. So too in ef$gh, the $ in this case stands for itself. And, as with any other metacharacter, \^ and \$ may be used, as may [$].[3]

BRE operator precedence

As in mathematical expressions, the regular expression operators have a certain defined precedence. This means that certain operators are applied before (have higher precedence than) other operators. Table 3-5 provides the precedence for the BRE operators, from highest to lowest.

Table 3-5. BRE operator precedence from highest to lowest

Operator

Meaning

[. .] [= =] [: :]

Bracket symbols for character collation

\ metacharacter

Escaped metacharacters

[ ]

Bracket expressions

\( \) \ digit

Subexpressions and backreferences

* \{ \}

Repetition of the preceding single-character regular expression

no symbol

Concatenation

^ $

Anchors

Extended Regular Expressions

EREs, as the name implies, have more capabilities than do basic regular expressions. Many of the metacharacters and capabilities are identical. However, some of the metacharacters that look similar to their BRE counterparts have different meanings.

Matching single characters

When it comes to matching single characters, EREs are essentially the same as BREs. In particular, normal characters, the backslash character for escaping metacharacters, and bracket expressions all behave as described earlier for BREs.

One notable exception is that in awk, \ is special inside bracket expressions. Thus, to match a left bracket, dash, right bracket, or backslash, you could use [\[\-\]\]. Again, this reflects historical practice.

Backreferences don't exist

Backreferences don't exist in EREs.[4] Parentheses are

Return Main Page Previous Page Next Page

®Online Book Reader