Online Book Reader

Home Category

Beautiful Code [83]

By Root 5184 0
emphasis on tools, it also seemed best to focus on the class of regular expressions found in grep—rather than, say, those from shell wildcards—since we could also then talk about the design of grep itself.

The problem was that any existing regular expression package was far too big. The local grep was over 500 lines long (about 10 book pages) and encrusted with barnacles. Open source regular expression packages tended to be huge—roughly the size of the entire book—because they were engineered for generality, flexibility, and speed; none were remotely suitable for pedagogy.

I suggested to Rob that we find the smallest regular expression package that would illustrate the basic ideas while still recognizing a useful and nontrivial class of patterns. Ideally, the code would fit on a single page.

Rob disappeared into his office. As I remember it now, he emerged in no more than an hour or two with the 30 lines of C code that subsequently appeared in Chapter 9 of The Practice of Programming. That code implements a regular expression matcher that handles the following constructs.

Character Meaning

c Matches any literal character c.

. (period) Matches any single character.

^ Matches the beginning of the input string.

$ Matches the end of the input string.

* Matches zero or more occurrences of the previous character.

This is quite a useful class; in my own experience of using regular expressions on a day-to-day basis, it easily accounts for 95 percent of all instances. In many situations, solving the right problem is a big step toward creating a beautiful program. Rob deserves great credit for choosing a very small yet important, well-defined, and extensible set of features from among a wide set of options.

Rob's implementation itself is a superb example of beautiful code: compact, elegant, efficient, and useful. It's one of the best examples of recursion that I have ever seen, and it shows the power of C pointers. Although at the time we were most interested in conveying the important role of good notation in making a program easier to use (and perhaps easier to write as well), the regular expression code has also been an excellent way to illustrate algorithms, data structures, testing, performance enhancement, and other important topics.

A Regular Expression Matcher > Implementation

1.2. Implementation

In The Practice of Programming, the regular expression matcher is part of a standalone program that mimics grep, but the regular expression code is completely separable from its surroundings. The main program is not interesting here; like many Unix tools, it reads either its standard input or a sequence of files, and prints those lines that contain a match of the regular expression.

This is the matching code:

Code View: Scroll / Show All

/* match: search for regexp anywhere in text */

int match(char *regexp, char *text)

{

if (regexp[0] == '^')

return matchhere(regexp+1, text);

do { /* must look even if string is empty */

if (matchhere(regexp, text))

return 1;

} while (*text++ != '\0');

return 0;

}

/* matchhere: search for regexp at beginning of text */

int matchhere(char *regexp, char *text)

{

if (regexp[0] == '\0')

return 1;

if (regexp[1] == '*')

return matchstar(regexp[0], regexp+2, text);

if (regexp[0] == '$' && regexp[1] == '\0')

return *text == '\0';

if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))

return matchhere(regexp+1, text+1);

return 0;

}

/* matchstar: search for c*regexp at beginning of text */

int matchstar(int c, char *regexp, char *text)

{

do { /* a * matches zero or more instances */

if (matchhere(regexp, text))

return 1;

} while (*text != '\0' && (*text++ == c || c == '.'));

return 0;

}

A Regular Expression Matcher > Discussion

1.3. Discussion

The function match(regexp, text) tests whether there is an occurrence of the regular expression anywhere within the text; it returns 1 if a match is found and 0 if not. If there is more than one match, it finds the leftmost and shortest.

The

Return Main Page Previous Page Next Page

®Online Book Reader