Beautiful Code [120]
I wrote a simple program to read the BNF grammar from the XML specification, calculate the flag values for each of the 65,536 BMP code points, and then store it in one big binary file. I saved this binary data file along with my source code, and modified my Ant compile task to copy it into the build directory (Example 5-8).
Example 5-8. Saving and copying the binary lookup table
From there, the jar task will bundle the lookup table with the compiled .class files, so it doesn't add an extra file to the distribution or cause any added dependencies. The Verifier class can then use the class loader to find this file at runtime, as shown in Example 5-9.
Example 5-9. Loading the binary lookup table
Code View: Scroll / Show All
private static byte[] flags = null;
static {
ClassLoader loader = Verifier.class.getClassLoader();
if (loader != null) loadFlags(loader);
// If that didn't work, try a different ClassLoader
if (flags == null) {
loader = Thread.currentThread().getContextClassLoader();
loadFlags(loader);
}
}
private static void loadFlags(ClassLoader loader) {
DataInputStream in = null;
try {
InputStream raw = loader.getResourceAsStream("nu/xom/characters.dat");
if (raw == null) {
throw new RuntimeException("Broken XOM installation: "
+ "could not load nu/xom/characters.dat");
}
in = new DataInputStream(raw);
flags = new byte[65536];
in.readFully(flags);
}
catch (IOException ex) {
throw new RuntimeException("Broken XOM installation: "
+ "could not load nu/xom/characters.dat");
}
finally {
try {
if (in != null) in.close();
}
catch (IOException ex) {
// no big deal
}
}
}
This task takes up about 64KB of heap space. However, that's not really a problem on a desktop or server, and we only have to load this data once. The code is careful not to reload the data once it's already been loaded.
Now that the lookup table is stored in memory, checking any property of any character is a simple matter of performing an array lookup followed by a couple of bitwise operations. Example 5-10 shows the new code for checking a noncolonized name, such as an element or attribute local name. All we have to do is look up the flags in the table and compare the bit corresponding to the desired property.
Example 5-10. Using the lookup table to check a name
Code View: Scroll / Show All
// constants for the bit flags in the characters lookup table
private final static byte XML_CHARACTER = 1;
private final static byte NAME_CHARACTER = 2;
private final static byte NAME_START_CHARACTER = 4;
private final static byte NCNAME_CHARACTER = 8;
static void checkNCName(String name) {
if (name == null) {
throwIllegalNameException(name, "NCNames cannot be null");
}
int length = name.length();
if (length == 0) {
throwIllegalNameException(name, "NCNames cannot be empty");
}
char first = name.charAt(0);
if ((flags[first] & NAME_START_CHARACTER) == 0) {
throwIllegalNameException(name, "NCNames cannot start " +
"with the character " + Integer.toHexString(first));
}
for (int i = 1; i < length; i++) {
char c = name.charAt(i);
if ((flags[c] & NCNAME_CHARACTER) == 0) {
if (c == ':') {
throwIllegalNameException(name,