Professional C__ - Marc Gregoire [260]
Enter a date (year/month/day) (q=quit): 11/12/01
Invalid date!
These date matching examples only check if the date consists of a year (four digits), a month (1-12) and a day (1-31). They do not perform any validation for leap years and so on. If you need that, you have to write code to validate the year, month and day values that are extracted by regex_match(). This validation is not a job for regular expressions, so this is not shown.
regex_search()
The regex_match() algorithm discussed in the previous section returns true if the entire source string matches the regular expression, false otherwise. It cannot be used to find a matching sub-string in the source string. The regex_search() algorithm allows you to search for a sub-string that matches a certain pattern in a source string. There are six versions of the regex_search() algorithm. The difference between them is in the type of arguments, similar to the six versions of regex_match(). See the Standard Library Reference resource on the website for more details.
One of the versions of the regex_search() algorithm accepts a begin and end iterator into a string that you want to process. You might be tempted to use this version of regex_search() in a loop to find all occurrences of a pattern in a source string by manipulating these begin and end iterators for each regex_search() call. Never do this! It can cause problems when your regular expression uses anchors (^ or $), word boundaries, and so on. It can also cause an infinite loop due to empty matches. Use the regex_iterator or regex_token_iterator as explained later in this chapter to extract all occurrences of a pattern from a source string.
Never use regex_search() in a loop to find all occurrences of a pattern in a source string. Instead, use a regex_iterator or regex_token_iterator.
regex_search() Example
The regex_search() algorithm can be used to extract matching sub-strings from an input string. The following example extracts code comments from input lines. The regular expression searches for a sub-string that starts with // followed by some optional whitespace \\s* followed by one or more characters captured in a capture group (.+). This capture group will capture only the comment sub-string. The smatch object m will contain the search results. To get a string representation of the first capture group, you can write m[1] as in the following code or write m[1].str(). You can check the m[1].first and m[1].second iterators to see where exactly the sub-string was found in the source string.
regex r("//\s*(.+)");
while (true) {
cout << "Enter a string (q=quit): ";
string str;
if (!getline(cin, str) || str == "q")
break;
smatch m;
if (regex_search(str, m, r))
cout << " Found comment '" << m[1] << "'" << endl;
else
cout << " No comment found!" << endl;
}
Code snippet from RegularExpressions\regex_search_comments.cpp
The output of this program can look as follows:
Enter a string (q=quit): std::string str; // Our source string
Found comment 'Our source string'
Enter a string (q=quit): int a; // A comment with // in the middle
Found comment 'A comment with // in the middle'
Enter a string (q=quit): float f; // A comment with a (tab) character
Found comment 'A comment with a (tab) character'
The match_results object also has a prefix() and suffix() method, which returns the string preceding or following the match respectively.
regex_iterator
As explained in the previous section, you should never use regex_search() in a loop to extract all occurrences of a pattern from a source string. Instead, you should use a regex_iterator or regex_token_iterator. They work similarly like iterators for STL containers which are discussed in Chapter 12.
Internally, both a regex_iterator and a regex_token_iterator contain a pointer to the regular expression. Because of this, you should not create them with a temporary regex object.
Never try to create a regex_iterator or regex_token_iterator with a temporary regex object.
regex_iterator Example
The following example asks the user to enter