Online Book Reader

Home Category

Beautiful Code [120]

By Root 5281 0
it. I began by defining a simple binary format, one byte for each of the 65,536 Unicode code points in the Basic Multilingual Plane (BMP). Each byte contains eight bit flags that identify the most important character properties. For instance, bit 1 is on if the character is legal in PCDATA content, and off if it is not legal. Bit 2 is on if the character can be used in an XML name, and off if it cannot. Bit 3 is on if the character can be the start of an XML name, and off if it cannot.

I wrote a simple program to read the BNF grammar from the XML specification, calculate the flag values for each of the 65,536 BMP code points, and then store it in one big binary file. I saved this binary data file along with my source code, and modified my Ant compile task to copy it into the build directory (Example 5-8).

Example 5-8. Saving and copying the binary lookup table

description="Compile the source code">

tofile="${build.dest}/nu/xom/characters.dat"/>

From there, the jar task will bundle the lookup table with the compiled .class files, so it doesn't add an extra file to the distribution or cause any added dependencies. The Verifier class can then use the class loader to find this file at runtime, as shown in Example 5-9.

Example 5-9. Loading the binary lookup table

Code View: Scroll / Show All

private static byte[] flags = null;

static {

ClassLoader loader = Verifier.class.getClassLoader();

if (loader != null) loadFlags(loader);

// If that didn't work, try a different ClassLoader

if (flags == null) {

loader = Thread.currentThread().getContextClassLoader();

loadFlags(loader);

}

}

private static void loadFlags(ClassLoader loader) {

DataInputStream in = null;

try {

InputStream raw = loader.getResourceAsStream("nu/xom/characters.dat");

if (raw == null) {

throw new RuntimeException("Broken XOM installation: "

+ "could not load nu/xom/characters.dat");

}

in = new DataInputStream(raw);

flags = new byte[65536];

in.readFully(flags);

}

catch (IOException ex) {

throw new RuntimeException("Broken XOM installation: "

+ "could not load nu/xom/characters.dat");

}

finally {

try {

if (in != null) in.close();

}

catch (IOException ex) {

// no big deal

}

}

}

This task takes up about 64KB of heap space. However, that's not really a problem on a desktop or server, and we only have to load this data once. The code is careful not to reload the data once it's already been loaded.

Now that the lookup table is stored in memory, checking any property of any character is a simple matter of performing an array lookup followed by a couple of bitwise operations. Example 5-10 shows the new code for checking a noncolonized name, such as an element or attribute local name. All we have to do is look up the flags in the table and compare the bit corresponding to the desired property.

Example 5-10. Using the lookup table to check a name

Code View: Scroll / Show All

// constants for the bit flags in the characters lookup table

private final static byte XML_CHARACTER = 1;

private final static byte NAME_CHARACTER = 2;

private final static byte NAME_START_CHARACTER = 4;

private final static byte NCNAME_CHARACTER = 8;

static void checkNCName(String name) {

if (name == null) {

throwIllegalNameException(name, "NCNames cannot be null");

}

int length = name.length();

if (length == 0) {

throwIllegalNameException(name, "NCNames cannot be empty");

}

char first = name.charAt(0);

if ((flags[first] & NAME_START_CHARACTER) == 0) {

throwIllegalNameException(name, "NCNames cannot start " +

"with the character " + Integer.toHexString(first));

}

for (int i = 1; i < length; i++) {

char c = name.charAt(i);

if ((flags[c] & NCNAME_CHARACTER) == 0) {

if (c == ':') {

throwIllegalNameException(name,

Return Main Page Previous Page Next Page

®Online Book Reader