A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. 1998

P O Lewis
Department of Biology, University of New Mexico, Albuquerque 87131-1091, USA. lewisp@unm.edu

Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.

UI MeSH Term Description Entries
D008957 Models, Genetic Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Genetic Models,Genetic Model,Model, Genetic
D010802 Phylogeny The relationships of groups of organisms as reflected by their genetic makeup. Community Phylogenetics,Molecular Phylogenetics,Phylogenetic Analyses,Phylogenetic Analysis,Phylogenetic Clustering,Phylogenetic Comparative Analysis,Phylogenetic Comparative Methods,Phylogenetic Distance,Phylogenetic Generalized Least Squares,Phylogenetic Groups,Phylogenetic Incongruence,Phylogenetic Inference,Phylogenetic Networks,Phylogenetic Reconstruction,Phylogenetic Relatedness,Phylogenetic Relationships,Phylogenetic Signal,Phylogenetic Structure,Phylogenetic Tree,Phylogenetic Trees,Phylogenomics,Analyse, Phylogenetic,Analysis, Phylogenetic,Analysis, Phylogenetic Comparative,Clustering, Phylogenetic,Community Phylogenetic,Comparative Analysis, Phylogenetic,Comparative Method, Phylogenetic,Distance, Phylogenetic,Group, Phylogenetic,Incongruence, Phylogenetic,Inference, Phylogenetic,Method, Phylogenetic Comparative,Molecular Phylogenetic,Network, Phylogenetic,Phylogenetic Analyse,Phylogenetic Clusterings,Phylogenetic Comparative Analyses,Phylogenetic Comparative Method,Phylogenetic Distances,Phylogenetic Group,Phylogenetic Incongruences,Phylogenetic Inferences,Phylogenetic Network,Phylogenetic Reconstructions,Phylogenetic Relatednesses,Phylogenetic Relationship,Phylogenetic Signals,Phylogenetic Structures,Phylogenetic, Community,Phylogenetic, Molecular,Phylogenies,Phylogenomic,Reconstruction, Phylogenetic,Relatedness, Phylogenetic,Relationship, Phylogenetic,Signal, Phylogenetic,Structure, Phylogenetic,Tree, Phylogenetic
D010944 Plants Multicellular, eukaryotic life forms of kingdom Plantae. Plants acquired chloroplasts by direct endosymbiosis of CYANOBACTERIA. They are characterized by a mainly photosynthetic mode of nutrition; essentially unlimited growth at localized regions of cell divisions (MERISTEMS); cellulose within cells providing rigidity; the absence of organs of locomotion; absence of nervous and sensory systems; and an alternation of haploid and diploid generations. It is a non-taxonomical term most often referring to LAND PLANTS. In broad sense it includes RHODOPHYTA and GLAUCOPHYTA along with VIRIDIPLANTAE. Plant
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D016013 Likelihood Functions Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters. Likelihood Ratio Test,Maximum Likelihood Estimates,Estimate, Maximum Likelihood,Estimates, Maximum Likelihood,Function, Likelihood,Functions, Likelihood,Likelihood Function,Maximum Likelihood Estimate,Test, Likelihood Ratio

Related Publications

Copied contents to your clipboard!