Universal Features for the Classification of Coding and Non-coding DNA Sequences. 2009

Nicolas Carels, and Ramon Vidal, and Diego Frías
Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil.

In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.

UI MeSH Term Description Entries

Related Publications

Nicolas Carels, and Ramon Vidal, and Diego Frías
January 2005, International journal of bioinformatics research and applications,
Nicolas Carels, and Ramon Vidal, and Diego Frías
January 1979, Ciba Foundation symposium,
Nicolas Carels, and Ramon Vidal, and Diego Frías
July 2009, BMC research notes,
Nicolas Carels, and Ramon Vidal, and Diego Frías
February 2000, Journal of theoretical biology,
Nicolas Carels, and Ramon Vidal, and Diego Frías
July 2018, Bioinformatics (Oxford, England),
Nicolas Carels, and Ramon Vidal, and Diego Frías
January 2004, Genome biology,
Nicolas Carels, and Ramon Vidal, and Diego Frías
March 2010, Nucleic acids research,
Nicolas Carels, and Ramon Vidal, and Diego Frías
March 1992, Nucleic acids research,
Nicolas Carels, and Ramon Vidal, and Diego Frías
November 1991, Plant molecular biology,
Copied contents to your clipboard!