Analysis of protein domain families in Caenorhabditis elegans. 1997

E L Sonnhammer, and R Durbin
Sanger Centre, Cambridge, United Kingdom.

The Caenorhabditis elegans genome sequencing project has completed over half of this nematode's 100-Mb genome. Proteins predicted in the finished sequence have been compiled and released in the data-base Wormpep. Presented here is a comprehensive analysis of protein domain families in Wormpep 11, which comprises 7299 proteins. The relative abundance of common protein domain families was counted by comparing all Wormpep proteins to the Pfam collection of protein families, which is based on recognition by hidden Markov models. This analysis also identified a number of previously unannotated domains. To investigate new apparently nematode-specific protein families, Wormpep was clustered into domain families on the basis of sequence similarity using the Domainer program. The largest clusters that lacked clear homology to proteins outside Nematoda were analyzed in further detail, after which some could be assigned a putative function. We compared all proteins in Wormpep 11 to proteins in the human, Saccharomyces cerevisiae, and Haemophilus influenzae genomes. Among the results are the estimation that over two-thirds of the currently known human proteins are likely to have a homologue in the whole C. elegans genome and that a significant number of proteins are well conserved between C. elegans and H. influenzae, that are not found in S. cerevisiae.

UI MeSH Term Description Entries
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D006193 Haemophilus influenzae A species of HAEMOPHILUS found on the mucous membranes of humans and a variety of animals. The species is further divided into biotypes I through VIII. Bacterium influenzae,Coccobacillus pfeifferi,Haemophilus meningitidis,Hemophilus influenzae,Influenza-bacillus,Mycobacterium influenzae
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001426 Bacterial Proteins Proteins found in any species of bacterium. Bacterial Gene Products,Bacterial Gene Proteins,Gene Products, Bacterial,Bacterial Gene Product,Bacterial Gene Protein,Bacterial Protein,Gene Product, Bacterial,Gene Protein, Bacterial,Gene Proteins, Bacterial,Protein, Bacterial,Proteins, Bacterial
D012441 Saccharomyces cerevisiae A species of the genus SACCHAROMYCES, family Saccharomycetaceae, order Saccharomycetales, known as "baker's" or "brewer's" yeast. The dried form is used as a dietary supplement. Baker's Yeast,Brewer's Yeast,Candida robusta,S. cerevisiae,Saccharomyces capensis,Saccharomyces italicus,Saccharomyces oviformis,Saccharomyces uvarum var. melibiosus,Yeast, Baker's,Yeast, Brewer's,Baker Yeast,S cerevisiae,Baker's Yeasts,Yeast, Baker
D015801 Helminth Proteins Proteins found in any species of helminth. Helminth Protein,Protein, Helminth,Proteins, Helminth
D016208 Databases, Factual Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references. Databanks, Factual,Data Banks, Factual,Data Bases, Factual,Data Bank, Factual,Data Base, Factual,Databank, Factual,Database, Factual,Factual Data Bank,Factual Data Banks,Factual Data Base,Factual Data Bases,Factual Databank,Factual Databanks,Factual Database,Factual Databases

Related Publications

E L Sonnhammer, and R Durbin
January 2010, Advances in experimental medicine and biology,
E L Sonnhammer, and R Durbin
October 2014, BMC genomics,
E L Sonnhammer, and R Durbin
January 2004, Methods in enzymology,
E L Sonnhammer, and R Durbin
January 1999, Annals of the New York Academy of Sciences,
E L Sonnhammer, and R Durbin
November 2003, The Journal of biological chemistry,
E L Sonnhammer, and R Durbin
April 2003, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences,
E L Sonnhammer, and R Durbin
March 2005, Proceedings of the National Academy of Sciences of the United States of America,
E L Sonnhammer, and R Durbin
January 2000, Methods in molecular biology (Clifton, N.J.),
Copied contents to your clipboard!