Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Beautiful Code [106]

By Root 7550 0

.]+) }

3 puts $1

4 end

5 end

The differences are subtle. In line 2, I've added a pair of parentheses (in boldface) around the interesting part of the article name in the regular expression. In line 3, instead of printing out the whole value of line, I print out $1, which in Ruby (and several other regular-expression-friendly languages) means "the first place in the regular expression marked with parentheses." You can mark lots of different pieces of the expression, and thus use $2, $3, and so on.

The first few lines of output produced by running this program over some logfile data look like this:

2003/10/10/FooCampMacs

2006/11/13/Rough-Mix

2003/05/22/StudentLookup

2003/11/13/FlyToYokohama

2003/07/31/PerlAngst

2003/05/21/RDFNet

2003/02/23/Democracy

2005/12/30/Spolsky-Recursion

2004/05/08/Torture

2004/04/27/RSSticker

Before we go to work determining the popularity of different articles, I'd like to argue that in some important ways, this code is beautiful. Take a moment and think of the code you'd have to write to look at an arbitrary chunk of text and do the same matching and selection work done by the parenthesized regexp. It would be quite a few lines of code, and it would be easy to get wrong. Furthermore, if the format of the logfile changed, fixing the pattern matcher would be error-prone and irritating.

Under the covers, the way that regular expressions work is also among the more wonderful things in computer science. It turns out that they can conveniently be translated into finite automata. These automata are mathematically elegant, and there are astoundingly efficient algorithms for matching them against the text you're searching. The great thing is that when you're running an automaton, you have to look only once at each character in the text you're trying to match. The effect is that a well-built regular expression engine can do pattern matching and selection faster than almost any custom code, even if it were written in hand-optimized assembly language. That's beautiful.

I think that the Ruby code is pretty attractive, too. Nearly every character of the program is doing useful work. Note that there are no semicolons on the ends of the lines, nor parentheses around the conditional block, and that you can write puts line instead of puts(line). Also, variables aren't declared—they're just used. This kind of stripped-down language design makes for programs that are shorter and easier to write, as well as (more important) easier to read and easier to understand.

Thinking in terms of time, regular expressions are a win/win. It takes the programmer way less time to write them than the equivalent code, it takes less time to deliver the program to the people waiting for it, it uses the computer really efficiently, and the program's user spends less time sitting there bored.

4.2.3. Content-Addressable Storage

Now we're approaching the core of our problem, computing the popularity of articles. We'll have to pull the article name out of each line, look it up to see how many times it's been fetched, add one to that number, and then store it away again.

This may be the most basic of search patterns: we start with a key (what we're using to search—in this case, an article name), and we're looking for a value (what we want to find—in this case, the number of times the article has been fetched). Here are some other examples:

Key Value

Word List of web pages containing the word

Employee number Employee's personnel record

Passport number "true" or "false," indicating whether the person with that passport should be subject to extra scrutiny

What programmers really want in this situation is a very old idea in computer science: content-addressable memory, also known as an associative store and various other permutations of those words. The idea is to put the key in and get the value out. There actually exists hardware which does just that; it mostly lives deep in the bowels of microprocessors, providing rapid access to page tables and memory caches.

The good news is that you, the

Online Book Reader

Beautiful Code [106]

®Online Book Reader