Online Book Reader

Home Category

Professional C__ - Marc Gregoire [254]

By Root 1444 0
<< L"In the US, the currency symbol is " << dollars << endl;

wcout << L"In Great Britain, the currency symbol is " << pounds << endl;

Code snippet from Facets\use_facet.cpp

REGULAR EXPRESSIONS


Regular expressions are a new and powerful addition to the C++11 Standard Library. They are a special mini-language for string processing. They might seem complicated at first, but once you get to know them, they make working with strings easier. Regular expressions can be used for several string-related operations:

Validation: Check if an input string is well-formed. For example: Is the input string a well-formed phone number?

Decision: Check what kind of string an input represents. For example: Is the input string the name of a JPEG or a PNG file?

Parsing: Extract information from an input string. For example: From a full filename, extract the filename part without the full path and without its extension.

Transformation: Search sub-strings and replace them with a new formatted sub-string. For example: Search all occurrences of “C++11” and replace them with “C++.”

Iteration: Search all occurrences of a sub-string. For example: Extract all phone numbers from an input string.

Tokenization: Split a string into sub-strings based on a set of delimiters. For example: Split a string on whitespace, commas, periods, and so on to extract its individual words.

Of course you could write your own code to perform any of the preceding operations on your strings, but using the regular expressions feature is highly recommended, because writing correct and safe code to process strings can be tricky.

Before we can go into more details on the regular expressions, there is some important terminology to know. The following terms are used throughout the discussion:

Pattern: The actual regular expression is a pattern represented by a string.

Match: Determines whether there is a match between a given regular expression and all of the characters in a given sequence [first,last).

Search: Determines whether there is some sub-string within a given sequence [first,last) that matches a given regular expression.

Replace: Identifies sub-strings in a given sequence, and replaces them with a corresponding new sub-string computed from another pattern, called a substitution pattern.

If you look around on the internet you will find out that there are several different grammars for regular expressions. For this reason, C++11 includes support for several of these grammars: ECMAScript, basic, extended, awk, grep, and egrep. If you already know any of these regular expression grammars, you can use it straight away in C++11 by telling the regular expression library to use that specific syntax (syntax_option_type). The default grammar in C++11 is ECMAScript whose syntax is explained in detail in the following section. It is also the most powerful grammar, so it’s highly recommended to use ECMAScript instead of one of the other more limited grammars. Explaining the other regular expression grammars falls outside the scope of this book.

If this is the first time you hear anything about regular expressions, just leave the powerful default ECMAScript syntax.

ECMAScript Syntax

A regular expression pattern is a sequence of characters representing what you want to match. Any character in the regular expression matches itself except for the following special characters:

^ $ \ . * + ? ( ) [ ] { } |

These special characters are explained throughout the following discussion. If you need to match one of these special characters, you need to escape it using the \ character. For example:

[ or . or * or \

Don’t forget that you need to escape the back slash in your C++ string literals. For example, if your regular expression needs to match the single * character, you need to escape it for the regular expression engine and for C++, so your C++ string literal should be \\*.

Anchors

The special characters ^ and $ are called anchors. The ^ character will match the beginning of the string and $ will match the end of the string. For example,

Return Main Page Previous Page Next Page

®Online Book Reader