Online Book Reader

Home Category

Beautiful Code [115]

By Root 4999 0
| [#x0CE6-#x0CEF] | [#x0D66-#x0D6F]

| [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]

Extender ::= #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6

| #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]

| [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131]

| [#x0134-#x013E] | [#x0141-#x0148] | [#x014A-#x017E]

| [#x0180-#x01C3] ...

CombiningChar ::= [#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486]

| [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF

| [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] | #x0670

| [#x06D6-#x06DC] | [#x06DD-#x06DF] | [#x06E0-#x06E4]

| [#x06E7-#x06E8] | [#x06EA-#x06ED]...

The complete set of rules would take up several pages here, as there are over 90,000 characters in Unicode to consider. In particular, the rules for BaseChar and CombiningChar have been shortened in this example.

To verify that a string is a legal XML name, it is necessary to iterate through each character in the string and verify that it is a legal name character as defined by the NameChar production.

Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers > Version 1: The Naïve Implementation

5.3. Version 1: The Naïve Implementation

My initial contribution to JDOM (shown in Example 5-2) simply deferred the rule checks to Java's Character class. The checkXMLName method returns an error message if an XML name is invalid, and null if it's valid. This itself is a questionable design; it should probably throw an exception if the name is invalid, and return void in all other cases. Later in this chapter, you'll see how future versions addressed this.

Example 5-2. The first version of name character verification

Code View: Scroll / Show All

private static String checkXMLName(String name) {

// Cannot be empty or null

if ((name == null) || (name.length() == 0) || (name.trim().equals(""))) {

return "XML names cannot be null or empty";

}

// Cannot start with a number

char first = name.charAt(0);

if (Character.isDigit(first)) {

return "XML names cannot begin with a number.";

}

// Cannot start with a $

if (first == '$') {

return "XML names cannot begin with a dollar sign ($).";

}

// Cannot start with a _

if (first == '-') {

return "XML names cannot begin with a hyphen (-).";

}

// Ensure valid content

for (int i=0, len = name.length(); i char c = name.charAt(i);

if ((!Character.isLetterOrDigit(c))

&& (c != '-')

&& (c != '$')

&& (c != '_')) {

return c + " is not allowed in XML names.";

}

}

// We got here, so everything is OK

return null;

}

This method was straightforward and easy to understand. Unfortunately, it was wrong. In particular:

It allowed names that contained colons. Because JDOM attempted to maintain namespace well-formedness, this had to be fixed.

The Java Character.isLetterOrDigit and Character.isDigit methods aren't perfectly aligned with XML's definition of letters and digits. Java considers some characters as letters that XML doesn't, and vice versa.

The Java rules change from one version of Java to the next. XML's rules don't.

Nonetheless, this was a reasonable first attempt. It did catch a large percentage of malformed names and didn't reject too many well-formed ones. It worked especially well in the common case when all the names were ASCII. Even so, JDOM strived for broader applicability than that. An improved implementation that actually followed XML's rules was called for.

Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers > Version 2: Imitating the BNF Grammar O(N)

5.4. Version 2: Imitating the BNF Grammar O(N)

My next contribution to JDOM manually translated the BNF productions into a series of if-else statements. The result looked like Example 5-3. You'll notice that this version is quite a bit more complicated.

Example 5-3. BNF-based name character verification

Code View: Scroll / Show All

private static String checkXMLName(String name) {

// Cannot be empty or null

Return Main Page Previous Page Next Page

®Online Book Reader