Fast statistical alignment. 2009

Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
Department of Mathematics, University of California Berkeley, Berkeley, California, United States of America. rbradley@berkeley.edu

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches--yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

UI MeSH Term Description Entries
D008390 Markov Chains A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system. Markov Process,Markov Chain,Chain, Markov,Chains, Markov,Markov Processes,Process, Markov,Processes, Markov
D008957 Models, Genetic Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Genetic Models,Genetic Model,Model, Genetic
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D003627 Data Interpretation, Statistical Application of statistical procedures to analyze specific observed or assumed facts from a particular study. Data Analysis, Statistical,Data Interpretations, Statistical,Interpretation, Statistical Data,Statistical Data Analysis,Statistical Data Interpretation,Analyses, Statistical Data,Analysis, Statistical Data,Data Analyses, Statistical,Interpretations, Statistical Data,Statistical Data Analyses,Statistical Data Interpretations
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001185 Artificial Intelligence Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language. AI (Artificial Intelligence),Computer Reasoning,Computer Vision Systems,Knowledge Acquisition (Computer),Knowledge Representation (Computer),Machine Intelligence,Computational Intelligence,Acquisition, Knowledge (Computer),Computer Vision System,Intelligence, Artificial,Intelligence, Computational,Intelligence, Machine,Knowledge Representations (Computer),Reasoning, Computer,Representation, Knowledge (Computer),System, Computer Vision,Systems, Computer Vision,Vision System, Computer,Vision Systems, Computer
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA

Related Publications

Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
February 2021, Systematic biology,
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
January 1984, Nucleic acids research,
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
January 1991, Computer applications in the biosciences : CABIOS,
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
December 2003, Proceedings of the National Academy of Sciences of the United States of America,
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
June 2012, Bioinformatics (Oxford, England),
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
March 2014, Journal of computational biology : a journal of computational molecular cell biology,
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
January 1999, Journal of computational biology : a journal of computational molecular cell biology,
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
November 2019, Bioinformatics (Oxford, England),
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
March 2013, Journal of biomedical materials research. Part A,
Robert K Bradley, and Adam Roberts, and Michael Smoot, and Sudeep Juvekar, and Jaeyoung Do, and Colin Dewey, and Ian Holmes, and Lior Pachter
August 2005, Systematic biology,
Copied contents to your clipboard!