Beautiful Code [190]
Genes that pass the filter are put into a hash, keyed by gene ID. Finally, the filter calls a routine named weedUnlessInHash that loops through each gene in the input to see whether it is in the hash and, if so, copies the gene to the output. The net result is a fast and flexible system in a relatively small amount of code.
The Design of the Gene Sorte > Theory of Beautiful Code in the Large
13.5. Theory of Beautiful Code in the Large
The Gene Sorter is one of the more beautiful programs at the design and code level that I've worked on. Most of the major parts of the system, including the cart, the directory of .ra riles, and the interface to the genomics database, are on their second or third iterations and incorporate lessons we learned from previous programs. The structure of the program's objects nicely parallels the major components of the user interface and the relational databases. The algorithms used are simple but effective, and make good trade-offs between speed, memory usage, and code complexity. The program has had very few bugs compared to most programs its size. Other people are able to come up to speed on the code base and contribute to it relatively quickly.
Programming is a human activity, and perhaps the resource that limits us most when programming is our human memory. We can typically keep a half-dozen things in our short-term memory. Any more than that requires us to involve our long-term memory as well. Our long-term memory system actually has an amazingly large capacity, but we enter things into it relatively slowly, and we can't retrieve things from it randomly, only by association.
While the structure of a program of no more than a few hundred lines can be dictated by algorithmic and machine considerations, the structure of larger programs must be dictated by human considerations, at least if we expect humans to work productively to maintain and extend them in the long term.
Ideally, everything that you need to understand a piece of code should fit on a single screen. If not, the reader of the code will be forced at best to hop around from screen to screen in hopes of understanding the code. If the code is complex, the reader is likely to have forgotten what is defined on each screen by the time he gets back to the initial screen, and will actually have to memorize large amounts of the code before he can understand any part of it. Needless to say, this will slow down programmers, and many of them will find it frustrating as well.
Well-chosen names are very important to making code locally understandable. It's OK to have a few local variables (no more than can fit in short-term memory) with one- and two-letter names. All other names should be words, short phrases, or commonly used (and short) abbreviations. In most cases, the reader should be able to tell the purpose of a variable or function just from its name.
These days, with our fancy integrated development environments, the reader can, at the click of a mouse, go from where a symbol is used to where it is defined. However, we want to write our code so that the user needs to go to the symbol definition only when she is curious about the details. We shouldn't force her to follow a couple of hyperlinks to understand each line.
Names can be too long as well as too short, though most programmers, influenced by the mathematical descriptions of algorithms and such evils as Hungarian notation, err on the short side. It may take some time to come up with a good name, but it is time well spent.
For a local variable, a well-chosen name may be sufficient documentation. Thus, Example 13-3 shows a reasonably nicely done function from the Gene Sorter. It filters associations according to criteria that can contain wildcards. (There is also a simpler, faster