Online Book Reader

Home Category

Professional C__ - Marc Gregoire [256]

By Root 1256 0
a, b or c. If the first character is ^, it means “any but”:

ab[cde] matches abc, abd, and abe.

ab[^cde] matches abf, abp, and so on but not abc, abd, and abe.

If you need to match the ^, [ or ] characters themselves, you need to escape them, for example: [[\^]] matches the characters [, ^ or ].

If you want to specify all letters, you could use a character set like [abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ], however, this is clumsy and doing this several times is awkward, especially if you make a typo and omit one of the letters accidentally. There are two solutions to this.

The range specification in square brackets allows you to write [a-zA-Z] which recognizes all the letters in the range a to z and A to Z. If you need to match a hyphen, you need to escape it, for example [a-zA-Z\-]* matches any word including a hyphenated word.

Another capability is to use one of the character classes. These are used to denote specific types of characters and are represented as [:name:] where name is one of the classes in the following table:

CHARACTER CLASS NAME DESCRIPTION

alnum lowercase letters, uppercase letters, and digits

alpha lowercase letters and uppercase letters

blank space or tab characters

cntrl file format escape characters like newlines, form feeds, and so on (\f, \n, \r, \t and \v)

digit digits

graph lowercase letters, uppercase letters, digits, and punctuation characters

lower lowercase letters

print lowercase letters, uppercase letters, digits, punctuation characters, and space characters

punct punctuation characters

space space characters

upper uppercase letters

xdigit digits and ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’

d same as digit

s same as space

w same as alnum

Character classes are used within character sets, for example [[:alpha:]]* in English means the same as [a-zA-Z]*.

Because certain concepts like matching digits are so common, there are shorthand patterns for them. For example, [:digit:] and [:d:] mean the same thing as [0-9]. Some classes have an even shorter pattern using the escape notation \. For example \d means [:digit:]. Therefore, to recognize a sequence of one or more numbers, you can write any of the following patterns:

[0-9]+

[[:digit:]]+

[[:d:]]+

\d+

The following table lists the available escape notations for character classes:

ESCAPE NOTATION EQUIVALENT TO

\d [[:d:]]

\D [^[:d:]]

\s [[:s:]]

\S [^[:s:]]

\w [[:w:]]

\W [^[:w:]]

Some examples:

Test[5-8] will match Test5, Test6, Test7, and Test8.

[[:lower:]] will match a, b, and so on but not A, B, and so on.

[^[:lower:]] will match any character except lowercase letters like a, b, and so on.

[[:lower:]5-7] will match any lower case letter like a, b, and so on and will also match the numbers 5, 6, and 7.

Word Boundaries

A word boundary can mean the following:

The beginning of the source string if the first character of the source string is one of the word characters [A-Za-z0-9_].

The end of the source string if the last character of the source string is one of the word characters.

The first character of a word, which is one of the word characters, while the preceding character is not a word character.

The end of a word, which is a non-word character after a word, while the preceding character is a word character.

You can use \b to match a word boundary, and \B to match anything except a word boundary.

Back References

Back references allow you to reference a captured group inside the regular expression itself: \n refers to the n-th captured group. The 0-th capture group is equal to the complete match. For example the regular expression ^(\d+)-.*-\1$ matches a string that has the following format:

The beginning of the string ^

followed by one or more digits captured in a capture group (\d+)

followed by a dash -

followed by zero or more characters .*

followed by another dash -

followed by exactly the same digits captured by the first capture group \1

followed by the end of the string $

This regular expression will match 123-abc-123, 1234-a-1234,

Return Main Page Previous Page Next Page

®Online Book Reader