Online Book Reader

Home Category

Beautiful Code [38]

By Root 5117 0
graphics formats.

Independence from database schemes

The bioinformatics community has designed many dozens of database formats for managing genome annotation data, ranging from simple flat files to sophisticated relational databases. For maximum utility, I wanted to avoid tying Bio::Graphics to any specific database scheme. It should be just as easy to invoke Bio::Graphics to render a genome region described by a flat file as to have it render a segment of a genome described in an Oracle database.

The Design of the Gene Sorte > The User Interface of the Gene Sorter

13. The Design of the Gene Sorte

Jim Kent

This chapter is about a moderate-sized program i wrote called the gene sorter. The size of the Gene Sorter code is larger than the projects described in most of the other chapters, about 20,000 lines in all. Though there are some smaller pieces of the Gene Sorter that are quite nice, for me the real beauty is how easy it is to read, understand, and extend the program as a whole. In this chapter, I'll present an overview of what the Gene Sorter does, highlight some of the more important parts of the code, and then discuss the issues involved in making programs longer than a thousand lines enjoyable and even beautiful to work with.

The Gene Sorter helps scientists rapidly sift through the roughly 25,000 genes in the human genome to find those most relevant to their research. The program is part of the http://genome.ucsc.edu web site, which also contains many other tools for working with data generated by the Human Genome Project. The Gene Sorter design is simple and flexible. It incorporates many lessons we learned in two previous generations of programs that serve biomedical data over the Web. The program uses CGI to gather input from the user, makes queries into a MySQL database, and presents the results in HTML. About half of the program code resides in libraries shared with other http://genome.ucsc.edu tools.

The human genome is a digital code that somehow contains all of the information needed to build a human body, including that most remarkable of organs, the human brain. The information is stored in three billion bases of DNA. Each base can be an A, C, G, or T. Thus, there are two bits of information per base, or 750 megabytes of information in the genome.

It is remarkable that the information to build a human being could fit easily into a memory stick in your pocket. Even more remarkably, we know from an evolutionary analysis of many genomes that only about 10 percent of that information is actually needed. The other 90 percent of the genome consists primarily of relics from evolutionary experiments that turned into dead ends, and in the clutter left by virus-like elements known as transposons.

Most of the currently functional parts of the genome are found in genes. Genes consist of regulatory elements that determine how much of the gene product will be made, and the code for the gene product itself. The regulation of genes is often quite complex. Different types of cells use different genes. The same cell type uses different genes in different situations.

The gene products are diverse, too. A large and important class of genes produce messenger RNA (mRNA), which is then translated into proteins. These proteins include the receptors molecules that let the cell sense the environment and interact with other cells, the enzymes that help convert food to more usable forms of energy, and the transcription factors that control the activity of other genes. Though it has not been an easy job, science has identified about 90 percent of the genes in the genome, over 20,000 genes in all.

Most scientific research projects are interested in just a few dozen of these genes. People researching a rare genetic disease examine the patterns of inheritance of the disease to link the disease to perhaps a 10,000,000-base region of a single chromosome. In recent years scientists have tried to associate 100,000-base regions with more common diseases such as diabetes that are partly but not entirely genetic in nature.

Return Main Page Previous Page Next Page

®Online Book Reader