Assembler for de novo assembly of large genomes. 2013

Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
Institute of Information Science, Academia Sinica, Taipei 115, Taiwan.

Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, we propose an extension-based assembler, called JR-Assembler, where J and R stand for "jumping" extension and read "remapping." First, it uses the read count to select good quality reads as seeds. Second, it extends each seed by a whole-read extension process, which expedites the extension process and can jump over short repeats. Third, it uses a dynamic back trimming process to avoid extension termination due to sequencing errors. Fourth, it remaps reads to each assembled sequence, and if an assembly error occurs by the presence of a repeat, it breaks the contig at the repeat boundaries. Fifth, it applies a less stringent extension criterion to connect low-coverage regions. Finally, it merges contigs by unused reads. An extensive comparison of JR-Assembler with current assemblers using datasets from small, medium, and large genomes shows that JR-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less central processing unit time, especially for large genomes. Finally, a simulation study shows that JR-Assembler achieves a superior performance on memory use and central processing unit time than most current assemblers when the read length is 150 bp or longer, indicating that the advantages of JR-Assembler over current assemblers will increase as the read length increases with advances in next generation sequencing technology.

UI MeSH Term Description Entries
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D015203 Reproducibility of Results The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results. Reliability and Validity,Reliability of Result,Reproducibility Of Result,Reproducibility of Finding,Validity of Result,Validity of Results,Face Validity,Reliability (Epidemiology),Reliability of Results,Reproducibility of Findings,Test-Retest Reliability,Validity (Epidemiology),Finding Reproducibilities,Finding Reproducibility,Of Result, Reproducibility,Of Results, Reproducibility,Reliabilities, Test-Retest,Reliability, Test-Retest,Result Reliabilities,Result Reliability,Result Validities,Result Validity,Result, Reproducibility Of,Results, Reproducibility Of,Test Retest Reliability,Validity and Reliability,Validity, Face
D016680 Genome, Bacterial The genetic complement of a BACTERIA as represented in its DNA. Bacterial Genome,Bacterial Genomes,Genomes, Bacterial
D016681 Genome, Fungal The complete gene complement contained in a set of chromosomes in a fungus. Fungal Genome,Fungal Genomes,Genomes, Fungal
D059014 High-Throughput Nucleotide Sequencing Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc. High-Throughput Sequencing,Illumina Sequencing,Ion Proton Sequencing,Ion Torrent Sequencing,Next-Generation Sequencing,Deep Sequencing,High-Throughput DNA Sequencing,High-Throughput RNA Sequencing,Massively-Parallel Sequencing,Pyrosequencing,DNA Sequencing, High-Throughput,High Throughput DNA Sequencing,High Throughput Nucleotide Sequencing,High Throughput RNA Sequencing,High Throughput Sequencing,Massively Parallel Sequencing,Next Generation Sequencing,Nucleotide Sequencing, High-Throughput,RNA Sequencing, High-Throughput,Sequencing, Deep,Sequencing, High-Throughput,Sequencing, High-Throughput DNA,Sequencing, High-Throughput Nucleotide,Sequencing, High-Throughput RNA,Sequencing, Illumina,Sequencing, Ion Proton,Sequencing, Ion Torrent,Sequencing, Massively-Parallel,Sequencing, Next-Generation
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational

Related Publications

Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
August 2016, BMC genomics,
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
March 2012, Genome research,
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
June 2020, Current protocols in bioinformatics,
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
July 2021, International journal of molecular sciences,
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
June 2020, DNA research : an international journal for rapid publication of reports on genes and genomes,
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
December 2015, Bioinformatics (Oxford, England),
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
January 2011, Bioinformatics (Oxford, England),
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
January 2012, PloS one,
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
July 2015, Bioinformatics (Oxford, England),
Te-Chin Chu, and Chen-Hua Lu, and Tsunglin Liu, and Greg C Lee, and Wen-Hsiung Li, and Arthur Chun-Chieh Shih
January 2021, bioRxiv : the preprint server for biology,
Copied contents to your clipboard!