Beautiful Code [189]
If the relational table can contain more than one row per gene, its type becomes association. Associations with multiple values for a single gene are displayed as a comma-separated list in the Gene Sorter. Associations include the SQL code to fetch the data either one gene at a time (queryOne), for all genes (queryFull), or for the genes associated with a particular value (invQueryOne). The queryOne SQL actually returns two values, one to display in the Gene Sorter and another to use in the hyperlink, although these can be the same.
Most of the columns in the Gene Sorter are of type lookup or association, and given any relational table that is keyed by gene ID, it is a simple matter to make it into Gene Sorter columns.
Other columns, such as the gene expression columns, are relatively complex. Figure 13-1 shows a gene expression column as colored boxes underneath the names of various organs such as brain, liver, kidney, etc. The colors indicate how much of the mRNA for the gene is found in these specific organs in comparison to the level of mRNA in the body as a whole. Red indicates a higher-than-average expression, green a lower-than-average expression, and black an average expression level.
The entire set of gene expression information from fetal brain to testis in Figure 13-1 is considered a single Gene Sorter column. It's broken into three columns from the HTML table point of view, to provide the gray lines between groups of five organs for better readability.
The Design of the Gene Sorte > Filtering Down to Just the Relevant Genes
13.4. Filtering Down to Just the Relevant Genes
Filters are one of the most powerful features of the gene sorter. Filters can be applied to each of the columns in order to view just the genes relevant to a particular purpose. For instance, a filter on the gene expression column can be used to find genes that are expressed in the brain but not in other tissues. A filter on the genome position can find genes on the X chromosome. A combination of these filters could find brain-specific genes found on the X chromosome. These genes would be particularly interesting to researchers on autism, since that condition appears to be to a fairly strong degree sex-linked.
Each column has two filter methods: filterControls to write the HTML for the filter user interface and advFilter to actually run the filter. These two methods communicate with each other through cart variables that use a naming convention that includes the program name, the letters as, and the column name as prefixes to the specific variable name. In this way, different columns of the same type have different cart variables, and filter variables can be distinguished from other variables. A helpful routine named cartFindPrefix, which returns a list of all variables with a given prefix, is heavily used by the filter system.
The filters are arranged as a chain. Initially, the program constructs a list of all genes. Next it checks the cart to see whether any filters are set. If so, it calls the filters for each column. The first filter gets the entire gene list as input. Subsequent filters start with the output of the previous filter. The order in which the filters are applied doesn't matter.
The filters are the most speed-critical code in the Gene Sorter. Most of the code is executed on just 50 or 100 genes, but the filters work on tens of thousands. To keep good interactive response time, the filter should spend less than 0.0001 of a second per gene. A modern CPU operates so fast that generally 0.0001s is not much of a limitation. However, a disk seek still takes about 0.005s, so the filter must avoid causing seeks.
Most filters start by checking the cart to see whether any of their variables are set, and if not, just