Online Book Reader

Home Category

Beautiful Code [36]

By Root 5173 0
bioinformatics software developers to rapidly visualize a genome and all its annotations. It can be used in a standalone fashion to generate a static image of a region in a variety of graphics formats (including PNG, JPEG, and SVG), or incorporated into a web or desktop application to provide interactive scrolling, zooming, and data exploration.

Figure 12-1 gives an example of an image generated by Bio::Graphics. This image shows a region of the genome of C. elegans (a small soil-dwelling worm) that illustrates several aspects of a typical image generated by Bio::Graphics. The image is divided vertically into a series of horizontal tracks. The top track consists of a scale that runs horizontally from left to right. The units are in kilobases ("k"), indicating thousands of DNA bases. The region shown begins at just before position 160,000 of the C. elegans chromosome I, and extends to just after position 179,000, covering 20,000 base pairs in toto. There are four annotation tracks, each of which illustrates increasingly complex visualizations.

Figure 12-1. A sample image generated by Bio::Graphics

The original image is brightly colored, but has been reduced to grayscale here for printing. The simplest track is "cDNA for RNAi," which shows the positions of a type of experimental reagent that the research community has created for studying the regulation of C. elegans genes. The image contains a single annotation on the right named yk247c7. It consists of a black rectangle that begins at roughly position 173,500 and extends to roughly 176,000. It corresponds to a physical piece of DNA covering this region, which a researcher can order from a biotech supply company and use experimentally to change the activity of the gene that overlaps it—in this case, F56C11.6.

The "WABA alignments" track shows slightly more complex information. It visualizes quantitative data arising from comparing this part of the C. elegans genome to similar regions in a different worm. Regions that are highly similar are dark gray. Regions that are weakly similar are light gray. Regions of intermediate similarity are medium gray.

The "DNA/GC Content" track shows continuously variable quantitative information. This records the ratio of G and C nucleotides to A and T nucleotides across a sliding window of the nucleotide sequence. This ratio correlates roughly with the chances that the corresponding region of the genome contains a protein-coding gene.

The "Genes" track contains the most complex data: the positions of two protein-coding genes. Each gene has an internal structure that indicates which parts encode protein (dark gray, but blue in the original) and which have other functions (lighter gray). Notice how the coding of the leftmost gene (F56C11.2) corresponds pretty well to the dark-gray, highly similar regions in the WABA alignment track; this is because the protein-coding regions of genes tend to be very strongly conserved across species.

The gene named F56C11.6 is annotated with a function ("carboxylesterases"), indicating that it is related to a family of proteins responsible for a core part of carbon metabolism. In addition, it is shown with two alternative forms, indicating that it can encode more than one distinct protein. The two alternative forms are grouped together and given a distinct label. Notice that there are numerous alignments beneath this gene; this is a reflection that the gene belongs to a large family of related genes, and each related gene contributes to a different alignment.

The actual DNA nucleotide sequence is not shown in this representation because it isn't physically possible to squeeze a line of 20,000 base pairs into 800 pixels. However, when viewing smaller segments of a genome, Bio::Graphics can draw in the actual letters and ornament them (e.g., change their color or highlighting) to show the start and stop positions of interesting features.

12.1.2. Bio::Graphics Requirements

The job of Bio::Graphics is to take a series of genome annotations (called features in BioPerl parlance) and output a graphics

Return Main Page Previous Page Next Page

®Online Book Reader