A simulated annealing algorithm for finding consensus sequences. 2002

Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
Department of Mathematics, The University of Queensland, Qld 4072, Australia. jonathan@maths.uq.edu.au.

BACKGROUND A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. RESULTS This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.

UI MeSH Term Description Entries
D008390 Markov Chains A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system. Markov Process,Markov Chain,Chain, Markov,Chains, Markov,Markov Processes,Process, Markov,Processes, Markov
D008957 Models, Genetic Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Genetic Models,Genetic Model,Model, Genetic
D009010 Monte Carlo Method In statistics, a technique for numerically approximating the solution of a mathematical problem by studying the distribution of some random variable, often generated by a computer. The name alludes to the randomness characteristic of the games of chance played at the gambling casinos in Monte Carlo. (From Random House Unabridged Dictionary, 2d ed, 1993) Method, Monte Carlo
D011786 Quality Control A system for verifying and maintaining a desired level of quality in a product or process by careful planning, use of proper equipment, continued inspection, and corrective action as required. (Random House Unabridged Dictionary, 2d ed) Control, Quality,Controls, Quality,Quality Controls
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D012680 Sensitivity and Specificity Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed) Specificity,Sensitivity,Specificity and Sensitivity
D015203 Reproducibility of Results The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results. Reliability and Validity,Reliability of Result,Reproducibility Of Result,Reproducibility of Finding,Validity of Result,Validity of Results,Face Validity,Reliability (Epidemiology),Reliability of Results,Reproducibility of Findings,Test-Retest Reliability,Validity (Epidemiology),Finding Reproducibilities,Finding Reproducibility,Of Result, Reproducibility,Of Results, Reproducibility,Reliabilities, Test-Retest,Reliability, Test-Retest,Result Reliabilities,Result Reliability,Result Validities,Result Validity,Result, Reproducibility Of,Results, Reproducibility Of,Test Retest Reliability,Validity and Reliability,Validity, Face
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model
D016384 Consensus Sequence A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is the one which occurs most frequently at that site in the different sequences which occur in nature. The phrase also refers to an actual sequence which approximates the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by conserved sequences. Consensus Sequences,Sequence, Consensus,Sequences, Consensus

Related Publications

Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
April 1993, Computer applications in the biosciences : CABIOS,
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
March 2003, Theoretical population biology,
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
February 2018, Entropy (Basel, Switzerland),
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
August 2022, Sensors (Basel, Switzerland),
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
March 2018, Applied optics,
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
January 2016, Computational intelligence and neuroscience,
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
July 2002, Computer methods and programs in biomedicine,
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
January 2023, Sensors (Basel, Switzerland),
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
August 2005, Optics express,
Jonathan M Keith, and Peter Adams, and Darryn Bryant, and Dirk P Kroese, and Keith R Mitchelson, and Duncan A E Cochran, and Gita H Lala
August 2006, Physics in medicine and biology,
Copied contents to your clipboard!