Online Book Reader

Home Category

Beautiful Code [125]

By Root 5130 0
method called doCell:

public void doCell(Parse cell, int columnNumber) {

ignore(cell);

}

The ignore method simply adds the color gray to the cell, indicating that the cell has been ignored:

public static void ignore (Parse cell) {

cell.addToTag(" bgcolor=\"#efefef\"");

ignores++;

}

As defined, it doesn't look like Fixture does much of anything at all. All it does is traverse a document and turn cells gray. However, subclasses of Fixture can override any of those methods and do different things. They can gather information, save information, communicate with the application under test, and mark cell values. Fixture defines the default sequence for traversing an HMTL document.

This is a very un-framework-y way of doing things. Users don't "plug into" this frame-work; they subclass a class and override some default actions. Also, there's no real cover for the framework designer. Technically, all a user needs to call is the doTables method, but the entire traversal sequence, from doTables down to doCell, is public. FIT will have to live with that traversal sequence forever. There's no way to change it without breaking client code. From a traditional framework perspective, this is bad, but what if you are confident in the traversal sequence? The sequence mirrors the parts of HTML that we care about, and it's very stable; it's hard to imagine HTML changing in a way that would break it. Living with it forever might be OK.

Framework for Integrated Test: Beauty Through Fragility > How Simple Can an HTML Parser Be?

6.4. How Simple Can an HTML Parser Be?

In addition to being an open framework, FIT presents some other surprising design choices. Earlier, I mentioned that all of FIT's HTML parsing is done by the Parse class. One of the things that I love the most about the Parse class is that it constructs an entire tree with its constructors.

Here's how it works. You create an instance of the class with a string of HTML as a constructor argument:

String input = read(new File(argv[0]);

Parse parse = new Parse(input);

The Parse constructor recursively constructs a tree of Parse instances, each of which represents a portion of the HTML document. The parsing code is entirely within the constructors of Parse.

Each Parse instance has five public strings and two references to other Parse objects:

public String leader;

public String tag;

public String body;

public String end;

public String trailer;

public Parse more;

public Parse parts;

When you construct your first Parse for an HMTL document, in a sense, you've constructed all of them. From that point on, you can use more and parts to traverse nodes. Here's the parsing code in the Parse class:

Code View: Scroll / Show All

static String tags[] = {"table", "tr", "td"};

public Parse (String text) throws ParseException {

this (text, tags, 0, 0);

}

public Parse (String text, String tags[]) throws ParseException {

this (text, tags, 0, 0);

}

public Parse (String text, String tags[], int level, int offset) throws ParseException

{

String lc = text.toLowerCase( );

int startTag = lc.indexOf("<"+tags[level]);

int endTag = lc.indexOf(">", startTag) + 1;

int startEnd = lc.indexOf(" int endEnd = lc.indexOf(">", startEnd) + 1;

int startMore = lc.indexOf("<"+tags[level], endEnd);

if (startTag<0 || endTag<0 || startEnd<0 || endEnd<0) {

throw new ParseException ("Can't find tag: "+tags[level], offset);

}

leader = text.substring(0,startTag);

tag = text.substring(startTag, endTag);

body = text.substring(endTag, startEnd);

end = text.substring(startEnd,endEnd);

trailer = text.substring(endEnd);

if (level+1 < tags.length) {

parts = new Parse (body, tags, level+1, offset+endTag);

body = null;

}

if (startMore>=0) {

more = new Parse (trailer, tags, level, offset+endEnd);

trailer = null;

}

}

One of the most interesting things about Parse is that it represents the entire HTML document. The leader string holds all of the text

Return Main Page Previous Page Next Page

®Online Book Reader