![]() Filling The Gaps In The Human Genome |
|||
|
The human genome was sequenced at the beginning of this decade and it was thought that the 20,000 or so genes comprising the genome were all known. However, researchers are now finding out that the gene catalog is far from complete. A recent paper published in Genome Research (Dec. 2007) by Adam Siepel and his collaborators identifies approximately 300 new human genes that were missed in the initial sequencing studies.
Siepel had been working for several years on developing methods for finding genes using comparative sequence data and testing their accuracy against sets of known genes. In this study he tested his methodology on the whole genome to see if he could find missing genes purely by examining evolutionary signatures from multiple mammalian genomes using comparative methods. “So that was part A,” says Siepel. “Then part B was to go after these genes experimentally and show that they were indeed expressed and spliced.” So in collaboration with experimental biologists Siepel followed up on his predictions with very detailed large-scale analyses, using PCR-based methods, to show that the predicted genes did exist and worked to establish their functional significance. The Genome Research study reported the existence of somewhere between 164 and 327 novel human genes missing from existing genome databases (the number varying depending on how the known genes are defined). “It was kind of surprising to see how incomplete the representations of some of these genes were in the databases,” says Siepel. For the most part, the genes that they found were either expressed in tissues in specific ways and at specific times in the development cycle or were ones that were expressed at low levels. “That wasn’t unexpected, but the particular classes of genes that we found were kind of interesting,” says Siepel. For instance, the two dominant sets of genes missing were those encoding for extracellular proteins and motor proteins, which was totally unexpected. “Many sequences that we found turned out to be large extensions of known genes, which was also somewhat of a surprise.” The gene data from the study is now accessible via the University of California at Santa Cruz (UCSC) genome browser, which is one of the most widely used, publicly available browsing tools for genomic data. “We’ve actually created a version of the UCSC browser specifically for this Mammalian Gene Collection (MGC) project called the MGC browser, which people can access,” says Siepel. Although the data is publicly available, the gene sequences identified exist only in parts. “So there’s follow-up work going on as part of the MGC project to get full-length transcripts for these genes. Once the full-length transcripts are available, they will get incorporated into various genome databases.” There are other projects ongoing in Siepel’s lab, that continue to examine the impact of evolution on genomics and vice versa. While the Genome Research paper focused on genes that are conserved across mammalian species Siepel is also looking at ways to identify genes that are only partially conserved between species or not conserved at all. “It’s a lot harder to use evolutionary information and yet allow for these changes. At the same time it is interesting to identify these genes that differ from one species to the other, as they may help distinguish the genomes in those species.” Siepel is also working on publishing his work on positive selection in mammalian genomes. “We are identifying genes that have been under evolutionary pressure to change, where new forms of these genes have been favored by natural selection over the previous version and we are doing a comprehensive examination of positively selected genes in all available mammalian genomes.”
|
|||
© 2006 Advantage Business Media All rights reserved. Use of this website is subject to its terms of use. Privacy Policy |