Computational prediction of inter-species relationships through omics data analysis and machine learning. 2018

Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
School of Business and Engineering Vaud (HEIG-VD), University of Applied Sciences Western Switzerland (HES-SO), Route. de Cheseaux 1, Yverdon-Les-Bains, 1400, Switzerland.

BACKGROUND Antibiotic resistance and its rapid dissemination around the world threaten the efficacy of currently-used medical treatments and call for novel, innovative approaches to manage multi-drug resistant infections. Phage therapy, i.e., the use of viruses (phages) to specifically infect and kill bacteria during their life cycle, is one of the most promising alternatives to antibiotics. It is based on the correct matching between a target pathogenic bacteria and the therapeutic phage. Nevertheless, correctly matching them is a major challenge. Currently, there is no systematic method to efficiently predict whether phage-bacterium interactions exist and these pairs must be empirically tested in laboratory. Herein, we present our approach for developing a computational model able to predict whether a given phage-bacterium pair can interact based on their genome. RESULTS Based on public data from GenBank and phagesDB.org, we collected more than a thousand positive phage-bacterium interactions with their complete genomes. In addition, we generated putative negative (i.e., non-interacting) pairs. We extracted, from the collected genomes, a set of informative features based on the distribution of predictive protein-protein interactions and on their primary structure (e.g. amino-acid frequency, molecular weight and chemical composition of each protein). With these features, we generated multiple candidate datasets to train our algorithms. On this base, we built predictive models exhibiting predictive performance of around 90% in terms of F1-score, sensitivity, specificity, and accuracy, obtained on the test set with 10-fold cross-validation. CONCLUSIONS These promising results reinforce the hypothesis that machine learning techniques may produce highly-predictive models accelerating the search of interacting phage-bacteria pairs.

UI MeSH Term Description Entries
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D000069550 Machine Learning A type of ARTIFICIAL INTELLIGENCE that enable COMPUTERS to independently initiate and execute LEARNING when exposed to new data. Transfer Learning,Learning, Machine,Learning, Transfer
D000078332 Data Analysis Process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data (https://ori.hhs.gov/education). Analyses, Data,Analysis, Data,Data Analyses
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D001419 Bacteria One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal. Bacteria can be classified by their response to OXYGEN: aerobic, anaerobic, or facultatively anaerobic; by the mode by which they obtain their energy: chemotrophy (via chemical reaction) or PHOTOTROPHY (via light reaction); for chemotrophs by their source of chemical energy: CHEMOLITHOTROPHY (from inorganic compounds) or chemoorganotrophy (from organic compounds); and by their source for CARBON; NITROGEN; etc.; HETEROTROPHY (from organic sources) or AUTOTROPHY (from CARBON DIOXIDE). They can also be classified by whether or not they stain (based on the structure of their CELL WALLS) with CRYSTAL VIOLET dye: gram-negative or gram-positive. Eubacteria
D001435 Bacteriophages Viruses whose hosts are bacterial cells. Phages,Bacteriophage,Phage
D013045 Species Specificity The restriction of a characteristic behavior, anatomical structure or physical system, such as immune response; metabolic response, or gene or gene variant to the members of one species. It refers to that property which differentiates one species from another but it is also used for phylogenetic levels higher or lower than the species. Species Specificities,Specificities, Species,Specificity, Species
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
January 2023, Frontiers in bioinformatics,
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
January 2021, Computational and structural biotechnology journal,
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
January 2022, Methods in molecular biology (Clifton, N.J.),
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
May 2021, Methods (San Diego, Calif.),
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
January 2022, Computational and structural biotechnology journal,
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
January 2021, Computational and structural biotechnology journal,
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
July 2021, Genome medicine,
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
January 2021, Biotechnology advances,
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
January 2019, Frontiers in genetics,
Diogo Manuel Carvalho Leite, and Xavier Brochet, and Grégory Resch, and Yok-Ai Que, and Aitana Neves, and Carlos Peña-Reyes
February 2011, Systematic and applied microbiology,
Copied contents to your clipboard!