Phenetic Comparison of Prokaryotic Genomes Using k-mers. 2017

Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada.

Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.

UI MeSH Term Description Entries
D010802 Phylogeny The relationships of groups of organisms as reflected by their genetic makeup. Community Phylogenetics,Molecular Phylogenetics,Phylogenetic Analyses,Phylogenetic Analysis,Phylogenetic Clustering,Phylogenetic Comparative Analysis,Phylogenetic Comparative Methods,Phylogenetic Distance,Phylogenetic Generalized Least Squares,Phylogenetic Groups,Phylogenetic Incongruence,Phylogenetic Inference,Phylogenetic Networks,Phylogenetic Reconstruction,Phylogenetic Relatedness,Phylogenetic Relationships,Phylogenetic Signal,Phylogenetic Structure,Phylogenetic Tree,Phylogenetic Trees,Phylogenomics,Analyse, Phylogenetic,Analysis, Phylogenetic,Analysis, Phylogenetic Comparative,Clustering, Phylogenetic,Community Phylogenetic,Comparative Analysis, Phylogenetic,Comparative Method, Phylogenetic,Distance, Phylogenetic,Group, Phylogenetic,Incongruence, Phylogenetic,Inference, Phylogenetic,Method, Phylogenetic Comparative,Molecular Phylogenetic,Network, Phylogenetic,Phylogenetic Analyse,Phylogenetic Clusterings,Phylogenetic Comparative Analyses,Phylogenetic Comparative Method,Phylogenetic Distances,Phylogenetic Group,Phylogenetic Incongruences,Phylogenetic Inferences,Phylogenetic Network,Phylogenetic Reconstructions,Phylogenetic Relatednesses,Phylogenetic Relationship,Phylogenetic Signals,Phylogenetic Structures,Phylogenetic, Community,Phylogenetic, Molecular,Phylogenies,Phylogenomic,Reconstruction, Phylogenetic,Relatedness, Phylogenetic,Relationship, Phylogenetic,Signal, Phylogenetic,Structure, Phylogenetic,Tree, Phylogenetic
D011387 Prokaryotic Cells Cells lacking a nuclear membrane so that the nuclear material is either scattered in the cytoplasm or collected in a nucleoid region. Cell, Prokaryotic,Cells, Prokaryotic,Prokaryotic Cell
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D005075 Biological Evolution The process of cumulative change over successive generations through which organisms acquire their distinguishing morphological and physiological characteristics. Evolution, Biological
D001419 Bacteria One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal. Bacteria can be classified by their response to OXYGEN: aerobic, anaerobic, or facultatively anaerobic; by the mode by which they obtain their energy: chemotrophy (via chemical reaction) or PHOTOTROPHY (via light reaction); for chemotrophs by their source of chemical energy: CHEMOLITHOTROPHY (from inorganic compounds) or chemoorganotrophy (from organic compounds); and by their source for CARBON; NITROGEN; etc.; HETEROTROPHY (from organic sources) or AUTOTROPHY (from CARBON DIOXIDE). They can also be classified by whether or not they stain (based on the structure of their CELL WALLS) with CRYSTAL VIOLET dye: gram-negative or gram-positive. Eubacteria
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings
D016680 Genome, Bacterial The genetic complement of a BACTERIA as represented in its DNA. Bacterial Genome,Bacterial Genomes,Genomes, Bacterial
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D056186 Metagenomics The systematic study of the GENOMES of assemblages of organisms. Community Genomics,Environmental Genomics,Population Genomics,Genomics, Community,Genomics, Environmental,Genomics, Population

Related Publications

Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
October 2021, Trends in genetics : TIG,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
January 2015, PloS one,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
August 2014, Journal of mathematical biology,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
July 2005, Microbiology (Reading, England),
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
January 2010, BMC bioinformatics,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
August 2017, Current protocols in microbiology,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
January 1999, Molekuliarnaia biologiia,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
January 2002, Current topics in microbiology and immunology,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
December 1993, Gene,
Maxime Déraspe, and Frédéric Raymond, and Sébastien Boisvert, and Alexander Culley, and Paul H Roy, and François Laviolette, and Jacques Corbeil
September 2021, IEEE/ACM transactions on computational biology and bioinformatics,
Copied contents to your clipboard!