Professional C__ - Marc Gregoire [257]
Regular Expressions and Raw String Literals
As seen in the preceding sections, regular expressions often use special characters that should be escaped in normal C++ string literals. For example, if you write \d in a regular expression it will match any digit. However, since \ is a special character in C++, you need to escape it in your regular expression string literal as \\d, otherwise your C++ compiler will try to interpret the \d. It can get more complicated if you want your regular expression to match a single back-slash character \. Because \ is a special character in the regular expression syntax itself, you need to escape it as \\. The \ character is also a special character in C++ string literals, so you need to escape it in your C++ string literal, resulting in \\\\.
You can use the new C++11 raw string literals to make complicated regular expression easier to read in your C++ source code. Raw string literals are explained earlier in this chapter. For example take the following regular expression:
( |\n|\r|\\\)
This regular expression searches for spaces, newlines, form feeds, and back slashes. As you can see, you need a lot of escape characters. Using raw string literals, this can be replaced with the following more readable regular expression:
R"(( |\n|\r|\\))"
The raw string literal starts with R"( and ends with )". Everything in between is the regular expression. Of course we still need a double back slash at the end because the back slash needs to be escaped in the regular expression itself.
This concludes a brief description of the ECMAScript grammar. The following section starts with actually using regular expressions in your C++11 code.
The regex Library
Everything for the C++11 regular expression library is in the basic_regex: An object representing a specific regular expression. match_results: A sub-string that matched a regular expression, including all the captured groups. It is a collection of sub_matches. sub_match: An iterator pair representing a specific matched capture group. The library provides three key algorithms: regex_match(), regex_search() and regex_replace(). These are explained in later sections. All of these algorithms have different versions that allow you to specify the source string as an STL string, a character array, or as a begin and end iterator pair. The iterators can be any of the following: const char* const wchar_t* string::const_iterator wstring::const_iterator In fact, any iterator that behaves as a bidirectional iterator can be used. Iterators are discussed in detail in Chapter 12. The library also defines regular expression iterators, which are very important if you want to find all occurrences of a pattern in a source string as you will see in a later section. There are two templated regular expression iterators defined: regex_iterator: iterates over all the occurrences of a pattern in a source string regex_token_iterator: iterates over all the capture groups of all occurrences of a pattern in a source string To make the library easier to use, the standard defines a number of typedefs for the preceding templates: typedef basic_regex typedef basic_regex typedef sub_match typedef sub_match typedef sub_match typedef sub_match typedef match_results typedef match_results typedef match_results typedef match_results typedef regex_iterator typedef regex_iterator typedef regex_iterator typedef regex_iterator typedef regex_token_iterator