RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. 2011

Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB101SD, United Kingdom. wash@mit.edu

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.

UI MeSH Term Description Entries
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D010455 Peptides Members of the class of compounds composed of AMINO ACIDS joined together by peptide bonds between adjacent amino acids into linear, branched or cyclical structures. OLIGOPEPTIDES are composed of approximately 2-12 amino acids. Polypeptides are composed of approximately 13 or more amino acids. PROTEINS are considered to be larger versions of peptides that can form into complex structures such as ENZYMES and RECEPTORS. Peptide,Polypeptide,Polypeptides
D004331 Drosophila melanogaster A species of fruit fly frequently used in genetics because of the large size of its chromosomes. D. melanogaster,Drosophila melanogasters,melanogaster, Drosophila
D004926 Escherichia coli A species of gram-negative, facultatively anaerobic, rod-shaped bacteria (GRAM-NEGATIVE FACULTATIVELY ANAEROBIC RODS) commonly found in the lower part of the intestine of warm-blooded animals. It is usually nonpathogenic, but some strains are known to produce DIARRHEA and pyogenic infections. Pathogenic strains (virotypes) are classified by their specific pathogenic mechanisms such as toxins (ENTEROTOXIGENIC ESCHERICHIA COLI), etc. Alkalescens-Dispar Group,Bacillus coli,Bacterium coli,Bacterium coli commune,Diffusely Adherent Escherichia coli,E coli,EAggEC,Enteroaggregative Escherichia coli,Enterococcus coli,Diffusely Adherent E. coli,Enteroaggregative E. coli,Enteroinvasive E. coli,Enteroinvasive Escherichia coli
D005815 Genetic Code The meaning ascribed to the BASE SEQUENCE with respect to how it is translated into AMINO ACID SEQUENCE. The start, stop, and order of amino acids of a protein is specified by consecutive triplets of nucleotides called codons (CODON). Code, Genetic,Codes, Genetic,Genetic Codes
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D012333 RNA, Messenger RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm. Messenger RNA,Messenger RNA, Polyadenylated,Poly(A) Tail,Poly(A)+ RNA,Poly(A)+ mRNA,RNA, Messenger, Polyadenylated,RNA, Polyadenylated,mRNA,mRNA, Non-Polyadenylated,mRNA, Polyadenylated,Non-Polyadenylated mRNA,Poly(A) RNA,Polyadenylated mRNA,Non Polyadenylated mRNA,Polyadenylated Messenger RNA,Polyadenylated RNA,RNA, Polyadenylated Messenger,mRNA, Non Polyadenylated
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D013058 Mass Spectrometry An analytical method used in determining the identity of a chemical based on its mass using mass analyzers/mass spectrometers. Mass Spectroscopy,Spectrometry, Mass,Spectroscopy, Mass,Spectrum Analysis, Mass,Analysis, Mass Spectrum,Mass Spectrum Analysis,Analyses, Mass Spectrum,Mass Spectrum Analyses,Spectrum Analyses, Mass

Related Publications

Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
January 1976, Progress in nucleic acid research and molecular biology,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
February 1991, Journal of molecular evolution,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
December 1978, Nucleic acids research,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
April 1990, Journal of theoretical biology,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
February 2012, Nature chemistry,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
November 1974, The Journal of the Acoustical Society of America,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
July 2000, Human genetics,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
June 1990, Biochemical genetics,
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
August 2012, G3 (Bethesda, Md.),
Stefan Washietl, and Sven Findeiss, and Stephan A Müller, and Stefan Kalkhof, and Martin von Bergen, and Ivo L Hofacker, and Peter F Stadler, and Nick Goldman
November 2022, eLife,
Copied contents to your clipboard!