Comparison of De Novo Assembly Strategies for Bacterial Genomes. 2021

Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
Key Laboratory of Animal Diseases and Human Health of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China.

(1) Background: Short-read sequencing allows for the rapid and accurate analysis of the whole bacterial genome but does not usually enable complete genome assembly. Long-read sequencing greatly assists with the resolution of complex bacterial genomes, particularly when combined with short-read Illumina data. However, it is not clear how different assembly strategies affect genomic accuracy, completeness, and protein prediction. (2) Methods: we compare different assembly strategies for Haemophilus parasuis, which causes Glässer's disease, characterized by fibrinous polyserositis and arthritis, in swine by using Illumina sequencing and long reads from the sequencing platforms of either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio). (3) Results: Assembly with either PacBio or ONT reads, followed by polishing with Illumina reads, facilitated high-quality genome reconstruction and was superior to the long-read-only assembly and hybrid-assembly strategies when evaluated in terms of accuracy and completeness. An equally excellent method was correction with Homopolish after the ONT-only assembly, which had the advantage of avoiding hybrid sequencing with Illumina. Furthermore, by aligning transcripts to assembled genomes and their predicted CDSs, the sequencing errors of the ONT assembly were mainly indels that were generated when homopolymer regions were sequenced, thus critically affecting protein prediction. Polishing can fill indels and correct mistakes. (4) Conclusions: The assembly of bacterial genomes can be directly achieved by using long-read sequencing techniques. To maximize assembly accuracy, it is essential to polish the assembly with homologous sequences of related genomes or sequencing data from short-read technology.

UI MeSH Term Description Entries
D010802 Phylogeny The relationships of groups of organisms as reflected by their genetic makeup. Community Phylogenetics,Molecular Phylogenetics,Phylogenetic Analyses,Phylogenetic Analysis,Phylogenetic Clustering,Phylogenetic Comparative Analysis,Phylogenetic Comparative Methods,Phylogenetic Distance,Phylogenetic Generalized Least Squares,Phylogenetic Groups,Phylogenetic Incongruence,Phylogenetic Inference,Phylogenetic Networks,Phylogenetic Reconstruction,Phylogenetic Relatedness,Phylogenetic Relationships,Phylogenetic Signal,Phylogenetic Structure,Phylogenetic Tree,Phylogenetic Trees,Phylogenomics,Analyse, Phylogenetic,Analysis, Phylogenetic,Analysis, Phylogenetic Comparative,Clustering, Phylogenetic,Community Phylogenetic,Comparative Analysis, Phylogenetic,Comparative Method, Phylogenetic,Distance, Phylogenetic,Group, Phylogenetic,Incongruence, Phylogenetic,Inference, Phylogenetic,Method, Phylogenetic Comparative,Molecular Phylogenetic,Network, Phylogenetic,Phylogenetic Analyse,Phylogenetic Clusterings,Phylogenetic Comparative Analyses,Phylogenetic Comparative Method,Phylogenetic Distances,Phylogenetic Group,Phylogenetic Incongruences,Phylogenetic Inferences,Phylogenetic Network,Phylogenetic Reconstructions,Phylogenetic Relatednesses,Phylogenetic Relationship,Phylogenetic Signals,Phylogenetic Structures,Phylogenetic, Community,Phylogenetic, Molecular,Phylogenies,Phylogenomic,Reconstruction, Phylogenetic,Relatedness, Phylogenetic,Relationship, Phylogenetic,Signal, Phylogenetic,Structure, Phylogenetic,Tree, Phylogenetic
D000081414 Nanopore Sequencing A sequencing protocol that drives nucleic acids (DNA or RNA) in an electric field through NANOPORES allowing single molecule sequence analysis. Nanopore Sequencings,Sequencing, Nanopore
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D013552 Swine Any of various animals that constitute the family Suidae and comprise stout-bodied, short-legged omnivorous mammals with thick skin, usually covered with coarse bristles, a rather long mobile snout, and small tail. Included are the genera Babyrousa, Phacochoerus (wart hogs), and Sus, the latter containing the domestic pig (see SUS SCROFA). Phacochoerus,Pigs,Suidae,Warthogs,Wart Hogs,Hog, Wart,Hogs, Wart,Wart Hog
D016415 Sequence Alignment The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms. Sequence Homology Determination,Determination, Sequence Homology,Alignment, Sequence,Alignments, Sequence,Determinations, Sequence Homology,Sequence Alignments,Sequence Homology Determinations
D016680 Genome, Bacterial The genetic complement of a BACTERIA as represented in its DNA. Bacterial Genome,Bacterial Genomes,Genomes, Bacterial
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D044137 Haemophilus parasuis A species of gram-negative bacteria in the genus HAEMOPHILUS found, in the normal upper respiratory tract of SWINE. Hemophilus parasuis
D059014 High-Throughput Nucleotide Sequencing Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc. High-Throughput Sequencing,Illumina Sequencing,Ion Proton Sequencing,Ion Torrent Sequencing,Next-Generation Sequencing,Deep Sequencing,High-Throughput DNA Sequencing,High-Throughput RNA Sequencing,Massively-Parallel Sequencing,Pyrosequencing,DNA Sequencing, High-Throughput,High Throughput DNA Sequencing,High Throughput Nucleotide Sequencing,High Throughput RNA Sequencing,High Throughput Sequencing,Massively Parallel Sequencing,Next Generation Sequencing,Nucleotide Sequencing, High-Throughput,RNA Sequencing, High-Throughput,Sequencing, Deep,Sequencing, High-Throughput,Sequencing, High-Throughput DNA,Sequencing, High-Throughput Nucleotide,Sequencing, High-Throughput RNA,Sequencing, Illumina,Sequencing, Ion Proton,Sequencing, Ion Torrent,Sequencing, Massively-Parallel,Sequencing, Next-Generation

Related Publications

Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
July 2014, BMC research notes,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
September 2013, Proceedings of the National Academy of Sciences of the United States of America,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
August 2019, Trends in plant science,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
July 2018, BMC bioinformatics,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
January 2012, PloS one,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
March 2016, BMC genomics,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
September 2011, Nature biotechnology,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
July 2015, Bioinformatics (Oxford, England),
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
January 2021, bioRxiv : the preprint server for biology,
Pengfei Zhang, and Dike Jiang, and Yin Wang, and Xueping Yao, and Yan Luo, and Zexiao Yang
July 2021, Genome biology,
Copied contents to your clipboard!