Online Book Reader

Home Category

Beautiful Code [170]

By Root 5189 0
data: the positions of two protein-coding genes. Each gene has an internal structure that indicates which parts encode protein (dark gray, but blue in the original) and which have other functions (lighter gray). Notice how the coding of the leftmost gene (F56C11.2) corresponds pretty well to the dark-gray, highly similar regions in the WABA alignment track; this is because the protein-coding regions of genes tend to be very strongly conserved across species.

The gene named F56C11.6 is annotated with a function ("carboxylesterases"), indicating that it is related to a family of proteins responsible for a core part of carbon metabolism. In addition, it is shown with two alternative forms, indicating that it can encode more than one distinct protein. The two alternative forms are grouped together and given a distinct label. Notice that there are numerous alignments beneath this gene; this is a reflection that the gene belongs to a large family of related genes, and each related gene contributes to a different alignment.

The actual DNA nucleotide sequence is not shown in this representation because it isn't physically possible to squeeze a line of 20,000 base pairs into 800 pixels. However, when viewing smaller segments of a genome, Bio::Graphics can draw in the actual letters and ornament them (e.g., change their color or highlighting) to show the start and stop positions of interesting features.

12.1.2. Bio::Graphics Requirements

The job of Bio::Graphics is to take a series of genome annotations (called features in BioPerl parlance) and output a graphics file formatted according to the programmer's specifications. Each feature has a start position, an end position, and a direction (pointing left, pointing right, or not pointing in either direction). Features may be associated with other attributes such as a name, a description, and a numeric value. Features may also have an internal structure and contain subfeatures and sub-subfeatures.

When designing this library, I had to address the following issues:

Open-ended nature of the problem

There are already a large number of genome annotation types, and the number is growing daily. While many annotations can be drawn with simple rectangles of different colors, many of them—particularly the quantitative ones—can be quite complex to represent. Furthermore, different bioinformaticians may want to represent the same annotation type differently; for example, there are many different ways of representing genes, each best suited for different circumstances.

In order to accommodate the open-ended nature of this problem, I wanted to make it easy to add new visual representations to Bio::Graphics and to extend existing ones. Each representation should be highly configurable; for example, the programmer should be able to exercise exquisite control over the height, weight, boundary color, and fill color of even the simple rectangle. Furthermore, the programmer should be able to alter how each feature is rendered on a case-by-case basis.

Feature density

Some genomic features are very dense, whereas others are more sparse (compare the "Genes" and "WABA alignments" tracks in Figure 12-1). Features can also overlap spatially. Sometimes you want overlapping features to partially obscure each other, and sometimes you'd like to practice collision control, shifting the overlapping features up or down vertically to make them distinct. To add to the problem, features can contain subfeatures that themselves overlap, so collision control and spatial layout need to work in a recursive fashion.

I thought it would be good if collision control could be activated and deactivated in a context-dependent manner. For example, if there are thousands of overlapping features in the region being displayed, collision control might cause a track to become unmanageably busy and should be turned off. The programmer should be able to control at what point this context-sensitive collision control kicks in, or override it entirely.

Handling scale

I wanted Bio::Graphics to be able to draw pictures of a

Return Main Page Previous Page Next Page

®Online Book Reader