Beautiful Code [84]
Notice that advancing over the input string is done with a do-while loop, a comparatively unusual construct in C programs. The occurrence of a do-while instead of a while should always raise a question: why isn't the loop termination condition being tested at the beginning of the loop, before it's too late, rather than at the end after something has been done? But the test is correct here: since the * operator permits zero-length matches, we first have to check whether a null match is possible.
The bulk of the work is done in the function matchhere(regexp, text), which tests whether the regular expression matches the text that begins right here. The function matchhere operates by attempting to match the first character of the regular expression with the first character of the text. If the match fails, there can be no match at this text position and matchhere returns 0. If the match succeeds, however, it's possible to advance to the next character of the regular expression and the next character of the text. This is done by calling matchhere recursively.
The situation is a bit more complicated because of some special cases, and of course the need to stop the recursion. The easiest case is that if the regular expression is at its end (regexp[0]=='\0'), all previous tests have succeeded, and thus the regular expression matches the text.
If the regular expression is a character followed by a *, matchstar is called to see whether the closure matches. The function matchstar(c, regexp, text) tries to match repetitions of the text character c, beginning with zero repetitions and counting up, until it either finds a match of the rest of the text, or it fails and thus concludes that there is no match. This algorithm identifies a "shortest match," which is fine for simple pattern matching as in grep, where all that matters is finding a match as quickly as possible. A "longest match" is more intuitive and almost certain to be better for a text editor where the matched text will be replaced. Most modern regular expression libraries provide both alternatives, and The Practice of Programming presents a simple variant of matchstar for this case, shown below.
If the regular expression consists of a $ at the end of the expression, the text matches only if it too is at its end:
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
Otherwise, if we are not at the end of the text string (that is, *text!='\0'), and if the first character of the text string matches the first character of the regular expression, so far so good; we go on to test whether the next character of the regular expression matches the next character of the text by making a recursive call to matchhere. This recursive call is the heart of the algorithm and the reason why the code is so compact and clean.
If all of these attempts to match fail, there can be no match at this point between the regular expression and the text, so matchhere returns 0.
This code uses C pointers intensively. At each stage of the recursion, if something matches, the recursive call that follows uses pointer arithmetic (e.g., regexp+1 and text+1) so that the subsequent function is called with the next character of the regular expression and of the text. The depth of recursion is no