Beautiful Code [116]
if ((name == null) || (name.length() == 0)
|| (name.trim().equals(""))) {
return "XML names cannot be null or empty";
}
// Cannot start with a number
char first = name.charAt(0);
if (!isXMLNameStartCharacter(first)) {
return "XML names cannot begin with the character \"" +
first + "\"";
}
// Ensure valid content
for (int i=0, len = name.length(); i if (!isXMLNameCharacter(c)) { return "XML names cannot contain the character \"" + c + "\""; } } // We got here, so everything is OK return null; } public static boolean isXMLNameCharacter(char c) { return (isXMLLetter(c) || isXMLDigit(c) || c == '.' || c == '-' || c == '_' || c == ':' || isXMLCombiningChar(c) || isXMLExtender(c)); } public static boolean isXMLNameStartCharacter(char c) { return (isXMLLetter(c) || c == '_' || c ==':'); } Instead of simply reusing Java's Character.isLetterOrDigit and Character.isDigit methods, the checkXMLName method in Example 5-3 delegates the checks to isXMLNameCharacter and isXMLNameStartCharacter. These methods further delegate to methods matching the other BNF productions for the different types of characters: letters, digits, combining characters, and extenders. Example 5-4 shows one of these methods, isXMLDigit. Notice that this method considers not only the ASCII digits, but also the other digit characters included in Unicode 2.0. The isXMLLetter, isXMLCombiningChar, and isXMLExtender methods follow the same pattern. They're just longer. Example 5-4. XML-based digit character verification public static boolean isXMLDigit(char c) { if (c >= 0x0030 && c <= 0x0039) return true; if (c >= 0x0660 && c <= 0x0669) return true; if (c >= 0x06F0 && c <= 0x06F9) return true; if (c >= 0x0966 && c <= 0x096F) return true; if (c >= 0x09E6 && c <= 0x09EF) return true; if (c >= 0x0A66 && c <= 0x0A6F) return true; if (c >= 0x0AE6 && c <= 0x0AEF) return true; if (c >= 0x0B66 && c <= 0x0B6F) return true; if (c >= 0x0BE7 && c <= 0x0BEF) return true; if (c >= 0x0C66 && c <= 0x0C6F) return true; if (c >= 0x0CE6 && c <= 0x0CEF) return true; if (c >= 0x0D66 && c <= 0x0D6F) return true; if (c >= 0x0E50 && c <= 0x0E59) return true; if (c >= 0x0ED0 && c <= 0x0ED9) return true; if (c >= 0x0F20 && c <= 0x0F29) return true; return false; } This approach satisfied the basic goals of the upgrade. It worked, and its operation was obvious. There was a clear mapping from the XML specification to the code. We could declare victory and go home. Well, not quite. This was where the ugly specter of performance raised its head. Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers > Version 3: First Optimization O(log N) 5.5. Version 3: First Optimization O(log N) As Donald Knuth once said, "Premature optimization is the root of all evil in programming." However, although optimization matters less often than programmers think, it does matter; and this was one of the minority cases where it matters. Profiling proved that JDOM was spending a significant chunk of time performing verification. Every name character required several checks, and JDOM recognized a nonname character only after checking it first against every possible name character. Consequently, the number of checks increased in direct proportion to the code point value. The project maintainers were beginning to grumble that maybe verification wasn't so important after all, and they might make it optional or ditch it entirely. Now, personally, I'm not willing to compromise correctness in the name of faster code, but it was apparent that the decision was going to be taken out of my hands if someone didn't do something. Fortunately, Jason Hunter did. Hunter restructured my naïve code in a very clever way, shown in Example 5-5. Previously, even the common case where a character was legal required over 100 tests for each of the possible ranges of illegal characters. Hunter noticed that