Online Book Reader

Home Category

Professional C__ - Marc Gregoire [262]

By Root 1097 0
sregex_token_iterator end;

for (sregex_token_iterator iter(str.begin(), str.end(), reg);

iter != end; ++iter) {

cout << "\"" << *iter << "\"" << endl;

}

}

Code snippet from RegularExpressions\regex_token_iterator_1.cpp

The following example asks the user to enter a date and then uses a regex_token_iterator to iterate over the second and third capture group (month and day), which is specified by using a vector. The regular expression used for dates is explained in an earlier section in this chapter:

regex reg("^(\d{4})/(0?[1-9]|1[0-2])/(0?[1-9]|[1-2][0-9]|3[0-1])$");

while (true) {

cout << "Enter a date (year/month/day) (q=quit): ";

string str;

if (!getline(cin, str) || str == "q")

break;

vector vec = {2, 3};

const sregex_token_iterator end;

for (sregex_token_iterator iter(str.begin(), str.end(), reg, vec);

iter != end; ++iter) {

cout << "\"" << *iter << "\"" << endl;

}

}

Code snippet from RegularExpressions\regex_token_iterator_2.cpp

This code prints only the month and day of valid dates. Output generated by this example can look as follows:

Enter a date (year/month/day) (q=quit): 2011/1/13

"1"

"13"

Enter a date (year/month/day) (q=quit): 2011/1/32

Enter a date (year/month/day) (q=quit): 2011/12/5

"12"

"5"

The regex_token_iterator can also be used to perform a so-called field splitting or tokenization. It is a much safer and more flexible alternative than using the old strtok() function. Tokenization is triggered in the regex_token_iterator constructor by specifying -1 as the capture group index to iterate over. When in tokenization mode, the iterator will iterate over all sub-strings of the source string that do not match the regular expression. The following code demonstrates this by tokenizing a string on the delimiters , and ; with any number of whitespace characters before or after the delimiters:

regex reg("\s*[,;]+\s*");

while (true) {

cout << "Enter a string to split on ',' and ';' (q=quit): ";

string str;

if (!getline(cin, str) || str == "q")

break;

const sregex_token_iterator end;

for (sregex_token_iterator iter(str.begin(), str.end(), reg, -1);

iter != end; ++iter) {

cout << "\"" << *iter << "\"" << endl;

}

}

Code snippet from RegularExpressions\regex_token_iterator_field_splitting.cpp

The regular expression in this example searches for patterns that match the following:

Zero or more whitespace characters,

followed by 1 or more , or ; characters,

followed by zero or more whitespace characters.

The output can be as follows:

Enter a string to split on ',' and ';' (q=quit): This is, a; test string.

"This is"

"a"

"test string."

As you can see from this output, the string is split on , and ; and all whitespace characters around the , or ; are removed, because the tokenization iterator iterates over all sub-strings that do not match the regular expression, and because the regular expression matches , and ; with whitespace around them.

regex_replace()

The regex_replace() algorithm requires a regular expression, and a formatting string that will be used to replace matching sub-strings. This formatting string can reference part of the matched sub-strings by using the following escape sequences:

ESCAPE SEQUENCE REPLACED WITH

$n the string matching the n-th capture group, for example $1 for the first capture group, $2 for the second, and so on

$& the string matching the whole regular expression, which is the same as $0

$' the part of the source string that appears to the left of the sub-string matching the regular expression

$' the part of the source string that appears to the right of the sub-string matching the regular expression

$$ a dollar sign

There are six versions of the regex_replace() algorithm. The difference between them is in the type of arguments:

template class traits, class charT, class ST, class SA>

OutputIterator

regex_replace(OutputIterator out,

BidirectionalIterator first,

BidirectionalIterator last,

const basic_regex& e,

const basic_string

®Online Book Reader