Fast and accurate read alignment for resequencing. 2012

John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

BACKGROUND Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. RESULTS We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data. BACKGROUND Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto BACKGROUND whwong@stanford.edu.

UI MeSH Term Description Entries
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D015894 Genome, Human The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs. Human Genome,Genomes, Human,Human Genomes
D016415 Sequence Alignment The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms. Sequence Homology Determination,Determination, Sequence Homology,Alignment, Sequence,Alignments, Sequence,Determinations, Sequence Homology,Sequence Alignments,Sequence Homology Determinations
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D054643 INDEL Mutation A mutation named with the blend of insertion and deletion. It refers to a length difference between two ALLELES where it is unknowable if the difference was originally caused by a SEQUENCE INSERTION or by a SEQUENCE DELETION. If the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region, it is also a FRAMESHIFT MUTATION. INDELs Mutation,Insertions-Deletions Mutation,Insertion-Deletion Mutation,INDEL Mutations,INDELs Mutations,Insertion Deletion Mutation,Insertion-Deletion Mutations,Insertions Deletions Mutation,Insertions-Deletions Mutations,Mutation, INDEL,Mutation, INDELs,Mutation, Insertion-Deletion,Mutation, Insertions-Deletions
D059014 High-Throughput Nucleotide Sequencing Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc. High-Throughput Sequencing,Illumina Sequencing,Ion Proton Sequencing,Ion Torrent Sequencing,Next-Generation Sequencing,Deep Sequencing,High-Throughput DNA Sequencing,High-Throughput RNA Sequencing,Massively-Parallel Sequencing,Pyrosequencing,DNA Sequencing, High-Throughput,High Throughput DNA Sequencing,High Throughput Nucleotide Sequencing,High Throughput RNA Sequencing,High Throughput Sequencing,Massively Parallel Sequencing,Next Generation Sequencing,Nucleotide Sequencing, High-Throughput,RNA Sequencing, High-Throughput,Sequencing, Deep,Sequencing, High-Throughput,Sequencing, High-Throughput DNA,Sequencing, High-Throughput Nucleotide,Sequencing, High-Throughput RNA,Sequencing, Illumina,Sequencing, Ion Proton,Sequencing, Ion Torrent,Sequencing, Massively-Parallel,Sequencing, Next-Generation
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
July 2009, Bioinformatics (Oxford, England),
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
March 2010, Bioinformatics (Oxford, England),
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
December 2022, Genome biology,
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
December 2019, Genome biology,
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
January 2023, Journal of biotechnology and biomedicine,
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
November 2016, Bioinformatics (Oxford, England),
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
March 2012, Nature methods,
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
November 2016, BMC bioinformatics,
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
February 2020, Nature methods,
John C Mu, and Hui Jiang, and Amirhossein Kiani, and Marghoob Mohiyuddin, and Narges Bani Asadi, and Wing H Wong
May 2015, Bioinformatics (Oxford, England),
Copied contents to your clipboard!