Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes. 2018

I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119992, Russia.

Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.

UI MeSH Term Description Entries
D008390 Markov Chains A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system. Markov Process,Markov Chain,Chain, Markov,Chains, Markov,Markov Processes,Process, Markov,Processes, Markov
D011387 Prokaryotic Cells Cells lacking a nuclear membrane so that the nuclear material is either scattered in the cytoplasm or collected in a nucleoid region. Cell, Prokaryotic,Cells, Prokaryotic,Prokaryotic Cell
D004268 DNA-Binding Proteins Proteins which bind to DNA. The family includes proteins which bind to both double- and single-stranded DNA and also includes specific DNA binding proteins in serum which can be used as markers for malignant diseases. DNA Helix Destabilizing Proteins,DNA-Binding Protein,Single-Stranded DNA Binding Proteins,DNA Binding Protein,DNA Single-Stranded Binding Protein,SS DNA BP,Single-Stranded DNA-Binding Protein,Binding Protein, DNA,DNA Binding Proteins,DNA Single Stranded Binding Protein,DNA-Binding Protein, Single-Stranded,Protein, DNA-Binding,Single Stranded DNA Binding Protein,Single Stranded DNA Binding Proteins
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D016680 Genome, Bacterial The genetic complement of a BACTERIA as represented in its DNA. Bacterial Genome,Bacterial Genomes,Genomes, Bacterial
D020745 Genome, Archaeal The genetic complement of an archaeal organism (ARCHAEA) as represented in its DNA. Archaeal Genome,Archaeal Genomes,Genomes, Archaeal
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural
D030541 Databases, Genetic Databases devoted to knowledge about specific genes and gene products. Genetic Databases,Genetic Sequence Databases,OMIM,Online Mendelian Inheritance In Man,Genetic Data Banks,Genetic Data Bases,Genetic Databanks,Genetic Information Databases,Bank, Genetic Data,Banks, Genetic Data,Data Bank, Genetic,Data Banks, Genetic,Data Base, Genetic,Data Bases, Genetic,Databank, Genetic,Databanks, Genetic,Database, Genetic,Database, Genetic Information,Database, Genetic Sequence,Databases, Genetic Information,Databases, Genetic Sequence,Genetic Data Bank,Genetic Data Base,Genetic Databank,Genetic Database,Genetic Information Database,Genetic Sequence Database,Information Database, Genetic,Information Databases, Genetic,Sequence Database, Genetic,Sequence Databases, Genetic

Related Publications

I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
October 2006, Current opinion in microbiology,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
April 2007, Molecular biology and evolution,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
July 1992, Journal of bacteriology,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
January 2006, Nucleic acids research,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
July 2022, Molecular biology and evolution,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
January 2009, International journal of bioinformatics research and applications,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
June 2021, Genome biology,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
October 2017, Molecular biology and evolution,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
January 2010, Nucleic acids research,
I S Rusinov, and A S Ershova, and A S Karyagina, and S A Spirin, and A V Alexeevski
January 2013, Nucleic acids research,
Copied contents to your clipboard!