Professional C__ - Marc Gregoire [262]
for (sregex_token_iterator iter(str.begin(), str.end(), reg);
iter != end; ++iter) {
cout << "\"" << *iter << "\"" << endl;
}
}
Code snippet from RegularExpressions\regex_token_iterator_1.cpp
The following example asks the user to enter a date and then uses a regex_token_iterator to iterate over the second and third capture group (month and day), which is specified by using a vector regex reg("^(\d{4})/(0?[1-9]|1[0-2])/(0?[1-9]|[1-2][0-9]|3[0-1])$"); while (true) { cout << "Enter a date (year/month/day) (q=quit): "; string str; if (!getline(cin, str) || str == "q") break; vector const sregex_token_iterator end; for (sregex_token_iterator iter(str.begin(), str.end(), reg, vec); iter != end; ++iter) { cout << "\"" << *iter << "\"" << endl; } } Code snippet from RegularExpressions\regex_token_iterator_2.cpp This code prints only the month and day of valid dates. Output generated by this example can look as follows: Enter a date (year/month/day) (q=quit): 2011/1/13 "1" "13" Enter a date (year/month/day) (q=quit): 2011/1/32 Enter a date (year/month/day) (q=quit): 2011/12/5 "12" "5" The regex_token_iterator can also be used to perform a so-called field splitting or tokenization. It is a much safer and more flexible alternative than using the old strtok() function. Tokenization is triggered in the regex_token_iterator constructor by specifying -1 as the capture group index to iterate over. When in tokenization mode, the iterator will iterate over all sub-strings of the source string that do not match the regular expression. The following code demonstrates this by tokenizing a string on the delimiters , and ; with any number of whitespace characters before or after the delimiters: regex reg("\s*[,;]+\s*"); while (true) { cout << "Enter a string to split on ',' and ';' (q=quit): "; string str; if (!getline(cin, str) || str == "q") break; const sregex_token_iterator end; for (sregex_token_iterator iter(str.begin(), str.end(), reg, -1); iter != end; ++iter) { cout << "\"" << *iter << "\"" << endl; } } Code snippet from RegularExpressions\regex_token_iterator_field_splitting.cpp The regular expression in this example searches for patterns that match the following: Zero or more whitespace characters, followed by 1 or more , or ; characters, followed by zero or more whitespace characters. The output can be as follows: Enter a string to split on ',' and ';' (q=quit): This is, a; test string. "This is" "a" "test string." As you can see from this output, the string is split on , and ; and all whitespace characters around the , or ; are removed, because the tokenization iterator iterates over all sub-strings that do not match the regular expression, and because the regular expression matches , and ; with whitespace around them. regex_replace() The regex_replace() algorithm requires a regular expression, and a formatting string that will be used to replace matching sub-strings. This formatting string can reference part of the matched sub-strings by using the following escape sequences: ESCAPE SEQUENCE REPLACED WITH $n the string matching the n-th capture group, for example $1 for the first capture group, $2 for the second, and so on $& the string matching the whole regular expression, which is the same as $0 $' the part of the source string that appears to the left of the sub-string matching the regular expression $' the part of the source string that appears to the right of the sub-string matching the regular expression $$ a dollar sign There are six versions of the regex_replace() algorithm. The difference between them is in the type of arguments: template OutputIterator regex_replace(OutputIterator out, BidirectionalIterator first, BidirectionalIterator last, const basic_regex const basic_string