Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome. 2001

S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
Laboratories of Computational Genomics, The Rockefeller University, New York, New York, USA.

The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes. This annotation strategy is applicable to genomes of all organisms, including human.

UI MeSH Term Description Entries
D004331 Drosophila melanogaster A species of fruit fly frequently used in genetics because of the large size of its chromosomes. D. melanogaster,Drosophila melanogasters,melanogaster, Drosophila
D005821 Genetic Techniques Chromosomal, biochemical, intracellular, and other methods used in the study of genetics. Genetic Technic,Genetic Technics,Genetic Technique,Technic, Genetic,Technics, Genetic,Technique, Genetic,Techniques, Genetic
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D016678 Genome The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA. Genomes
D017344 Genes, Insect The functional hereditary units of INSECTS. Insect Genes,Gene, Insect,Insect Gene
D017386 Sequence Homology, Amino Acid The degree of similarity between sequences of amino acids. This information is useful for the analyzing genetic relatedness of proteins and species. Homologous Sequences, Amino Acid,Amino Acid Sequence Homology,Homologs, Amino Acid Sequence,Homologs, Protein Sequence,Homology, Protein Sequence,Protein Sequence Homologs,Protein Sequence Homology,Sequence Homology, Protein,Homolog, Protein Sequence,Homologies, Protein Sequence,Protein Sequence Homolog,Protein Sequence Homologies,Sequence Homolog, Protein,Sequence Homologies, Protein,Sequence Homologs, Protein
D019476 Insect Proteins Proteins found in any species of insect. Insect Protein,Protein, Insect,Proteins, Insect
D020224 Expressed Sequence Tags Partial cDNA (DNA, COMPLEMENTARY) sequences that are unique to the cDNAs from which they were derived. ESTs,Expressed Sequence Tag,Sequence Tag, Expressed,Sequence Tags, Expressed,Tag, Expressed Sequence,Tags, Expressed Sequence

Related Publications

S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
April 2000, Genome research,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
October 1968, Genetics,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
January 2002, Genome biology,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
July 2003, Briefings in functional genomics & proteomics,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
September 1999, Cell death and differentiation,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
January 2008, Cell stress & chaperones,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
January 2015, PloS one,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
January 2022, microPublication biology,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
February 2011, Genome research,
S Gopal, and M Schroeder, and U Pieper, and A Sczyrba, and G Aytekin-Kurban, and S Bekiranov, and J E Fajardo, and N Eswar, and R Sanchez, and A Sali, and T Gaasterland
April 1994, Insect biochemistry and molecular biology,
Copied contents to your clipboard!