Optimal gap-affine alignment in O(s) space. 2023

Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
Computer Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain.

Pairwise sequence alignment remains a fundamental problem in computational biology and bioinformatics. Recent advances in genomics and sequencing technologies demand faster and scalable algorithms that can cope with the ever-increasing sequence lengths. Classical pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory. The recently proposed wavefront alignment algorithm (WFA) introduced an efficient algorithm to perform exact gap-affine alignment in O(ns) time, where s is the optimal score and n is the sequence length. Notwithstanding these bounds, WFA's O(s2) memory requirements become computationally impractical for genome-scale alignments, leading to a need for further improvement. In this article, we present the bidirectional WFA algorithm, the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA's time complexity of O(ns). As a result, this work improves the lowest known memory bound O(n) to compute gap-affine alignments. In practice, our implementation never requires more than a few hundred MBs aligning noisy Oxford Nanopore Technologies reads up to 1 Mbp long while maintaining competitive execution times. All code is publicly available at https://github.com/smarco/BiWFA-paper. Supplementary data are available at Bioinformatics online.

UI MeSH Term Description Entries
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D016678 Genome The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA. Genomes
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
January 1986, Bulletin of mathematical biology,
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
March 1993, Journal of theoretical biology,
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
July 1998, Proteins,
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
May 2011, BMC proceedings,
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
May 2021, Bioinformatics (Oxford, England),
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
December 2023, Bioinformatics (Oxford, England),
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
February 2005, Proteins,
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
January 2004, Proceedings. IEEE Computational Systems Bioinformatics Conference,
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
April 2014, Computer methods and programs in biomedicine,
Santiago Marco-Sola, and Jordan M Eizenga, and Andrea Guarracino, and Benedict Paten, and Erik Garrison, and Miquel Moreto
April 1995, Computer applications in the biosciences : CABIOS,
Copied contents to your clipboard!