Beautiful Code [224]
umap_mapids(vp1->v_mount, credp);
}
What we have here is a case of data describing the format of other data: a redirection in terms of data abstraction. This metadata allows the credential mapping code to manipulate the arguments of arbitrary system calls.
Another Level of Indirection > From Code to a Domain-Specific Language
17.4. From Code to a Domain-Specific Language
You may have noticed that some of the code associated with the implementation of the read system call, such as the packing of its arguments into a structure or the logic for calling the appropriate function, is highly stylized and is probably repeated in similar forms for all 52 other interfaces. Another implementation detail, which we have not so far discussed and which can keep me awake at nights, concerns locking.
Operating systems must ensure that various processes running concurrently don't step on each other's toes when they modify data without coordination between them. On modern multithreaded, multi-core processors, ensuring data consistency by maintaining one mutual exclusion lock for all critical operating system structures (as was the case in older operating system implementations) would result in an intolerable drain on performance. Therefore, locks are nowadays held over fine-grained objects, such as a user's credentials or a single buffer. Furthermore, because obtaining and releasing locks can be expensive operations, ideally once a lock is held it should not be released if it will be needed again in short order. These locking specifications can best be described through preconditions (what the state of a lock must be before entering a function) and postconditions (the state of the lock at a function's exit).
As you can imagine, programming under those constraints and verifying the code's correctness can be hellishly complicated. Fortunately for me, another level of indirection can be used to bring some sanity into the picture. This indirection handles both the redundancy of packing code and the fragile locking requirements.
In the FreeBSD kernel, the interface functions and data structures we've examined, such as VOP_READ_AP, VOP_READ_APV, and vop_read_desc, aren't directly written in C. Instead, a domain-specific language is used to specify the types of each call's arguments and their locking pre- and postconditions. Such an implementation style always raises my pulse, because the productivity boost it gives can be enormous. Here is an excerpt from the read system call specification:
#
#% read vp L L L
#
vop_read {
IN struct vnode *vp;
INOUT struct uio *uio;
IN int ioflag;
IN struct ucred *cred;
};
From specifications such as the above, an awk script creates:
C code for packing the arguments of the functions into a single structure
Declarations for the structures holding the packed arguments and the functions doing the work
Initialized data specifying the contents of the packed argument structures
The boilerplate C code we saw used for implementing filesystem layers
Assertions for verifying the state of the locks when the function enters and exits
In the FreeBSD version 6.1 implementation of the vnode call interface, all in all, 588 lines of domain-specific code expand into 4,339 lines of C code and declarations.
Such compilation from a specialized high-level domain-specific language into C is quite common in the computing field. For example, the input to the lexical analyzer generator lex is a file that maps regular expressions into actions; the input to the parser generator yacc is a language's grammar and corresponding production rules. Both systems (and their descendants flex and bison) generate C code implementing the high-level specifications. A more extreme case involves the early implementations of the C++ programming language. These consisted of a preprocessor, cfront, that would compile C++ code into C.
In all these cases, C is used as a portable assembly language. When used appropriately, domain-specific languages increase