Online Book Reader

Home Category

Beautiful Code [188]

By Root 5272 0
value, given key. */

char *invQueryOne; /* Query that returns key, given value. */

};

The structure starts with data shared by all types of columns. Next come the polymorphic methods. Finally, there's a section containing type-specific data.

Each column object contains space for the data of all types of columns. It would be possible, using a union or some related mechanism, to avoid this waste of space. However, this would complicate the use of the type-specific fields, and because there are fewer than 100 columns, the total space saved would be no more than a few kilobytes.

Most of the functionality of the program resides in the column methods. A column knows how to retrieve data for a particular gene either as a string or as HTML. A column can search for genes where the column data fits a simple search string. The columns also implement the interactive controls to filter data, and the routine to do the filtering itself.

The columns are created by a factory routine based on information in the columnDb.ra files. An excerpt of one of these files is shown in Example 13-2. All columnDb records contain fields describing the column name, user-visible short and long labels, the default location of the column in the table (priority), whether the column is visible by default, and a type field. The type field controls what methods the column has. There may be additional fields, some of which are type-specific. In many cases, the SQL used to query the tables in the database associated with a column is included in the columnDb record, as well as a URL to hyperlink to each item in the column.

Example 13-2. A section of a columnDb.ra file containing metadata on the columns

Code View: Scroll / Show All

name proteinName

shortLabel UniProt

longLabel UniProt (SwissProt/TrEMBL) Protein Display ID

priority 2.1

visibility off

type association kgXref

queryFull select kgID,spDisplayID from kgXref

queryOne select spDisplayId,spID from kgXref where kgID = '%s'

invQueryOne select kgID from kgXref where spDisplayId = '%s'

search fuzzy

itemUrl http://us.expasy.org/cgi-bin/niceprot.pl?%s

name proteinAcc

shortLabel UniProt Acc

longLabel UniProt (SwissProt/TrEMBL) Protein Accession

priority 2.15

visibility off

type lookup kgXref kgID spID

search exact

itemUrl http://us.expasy.org/cgi-bin/niceprot.pl?%s

name refSeq

shortLabel RefSeq

longLabel NCBI RefSeq Gene Accession

priority 2.2

visibility off

type lookup knownToRefSeq name value

search exact

itemUrl http://www.ncbi.nlm.nih.gov/entrez/query.

fcgi?cmd=Search&db=Nucleotide&term=%s&doptcmdl=GenBank&tool=genome.ucsc.edu

The format of a columnDb.ra file is simple: one field per line, and records separated by blank lines. Each line begins with the field name, and the remainder of the line is the field value.

This simple, line-oriented format is used for a lot of the metadata at http://genome.ucsc.edu. At one point, we considered using indexed versions of these files as an alternative to a relational database (.ra stands for relational alternative). But there are a tremendous number of good tools associated with relational databases, so we decided keep the bulk of our data relational. The .ra files are very easy to read, edit, and parse, though, so they see continued use in applications such as these.

The columnDb.ra files are arranged in a three-level directory hierarchy. At the root lies information about columns that appear for all organisms. The mid-level contains information that is organism-specific. As our understanding of a particular organism's genome progresses, we'll have different assemblies of its DNA sequence. The lowest level contains information that is assembly-specific.

The code that reads a columnDb constructs a hash of hashes, where the outer hash is keyed by the column name and the inner hashes are keyed by the field name. Information at the lower levels can contain entirely new records, or add or override particular fields of records first defined at a higher level.

Some types of columns correspond very directly to columns in the

Return Main Page Previous Page Next Page

®Online Book Reader