Beautiful Code [9]
And as if to confirm suspicions about the origins of good design, the delta editor was created by a single person over the course of a few hours (although that person was very familiar with the problem and the code base).
To understand what makes the delta editor beautiful, we must start by examining the problem it solves.
2.1. Version Control and Tree Transformation
Very early in the Subversion project, the team realized we had a general task that would be performed over and over: that of minimally expressing the difference between two similar (usually related) directory trees. As a version control system, one of Subversion's goals is to track revisions to directory structures as well as individual file contents. In fact, Subversion's server-side repository is fundamentally designed around directory versioning. A repository is simply a series of snapshots of a directory tree as that tree transforms over time. For each changeset committed to the repository, a new tree is created, differing from the preceding tree exactly where the changes are located and nowhere else. The unchanged portions of the new tree share storage with the preceding tree, and so on back into time. Each successive version of the tree is labeled with a monotonically increasing integer; this unique identifier is called a revision number.
Think of the repository as an array of revision numbers, stretching off into infinity. By convention, revision 0 is always an empty directory. In Figure 2-1, revision 1 has a tree hanging off it (typically the initial import of content into the repository), and no other revisions have been committed yet. The boxes represent nodes in this virtual filesystem: each node is either a directory (labeled DIR in the upper-right corner) or a file (labeled FILE).
What happens when we modify tuna? First, we make a new file node, containing the latest text. The new node is not connected to anything yet. As Figure 2-2 shows, it's just hanging out there in space, with no name.
Next, we create a new revision of its parent directory. As Figure 2-3 shows, the subgraph is still not connected to the revision array.
Figure 2-1. Conceptual view of revision numbers
Figure 2-2. New node when just created
Figure 2-3. Creation of new parent directory
We continue up the line, creating a new revision of the next parent directory (Figure 2-4).
At the top, we create a new revision of the root directory, as shown in Figure 2-5. This new directory needs an entry to point to the "new" directory A, but since directory B hasn't changed at all, the new root directory also has an entry still pointing to the old directory B's node.
Figure 2-4. Continuing to move up, creating parent directories
Figure 2-5. Complete new directory tree
Now that all the new nodes are written, we finish the "bubble up" process by linking the new tree to the next available revision in the history array, thus making it visible to repository users (Figure 2-6). In this case, the new tree becomes revision 2.
Thus each revision in the repository points to the root node of a unique tree, and the difference between that tree and the preceding one is the change that was committed in the new revision. To trace the changes, a program walks down both trees simultaneously, noting where entries point to different places. (For brevity, I've left out some details, such as saving storage space by compressing older nodes as differences against their newer versions.)
Figure 2-6. Finished revision: link to new tree
Although this tree-versioning model is all background to the main point of this chapter (the delta editor, which we'll come to soon), it has such nice properties that I considered making it the subject of its own chapter, as an example of beautiful code. Some of its attractive features are:
Easy reads
To locate revision n of file /path/to/foo.txt, one jumps to revision n, then walks down the tree to /path/to/foo.txt.
Writers don't interfere with readers
As writers