Deep Learning for Novel Antimicrobial Peptide Design. 2021

Christina Wang, and Sam Garlick, and Mire Zloh
UCL School of Pharmacy, University College London, London WC1N 1AX, UK.

Antimicrobial resistance is an increasing issue in healthcare as the overuse of antibacterial agents rises during the COVID-19 pandemic. The need for new antibiotics is high, while the arsenal of available agents is decreasing, especially for the treatment of infections by Gram-negative bacteria like Escherichia coli. Antimicrobial peptides (AMPs) are offering a promising route for novel antibiotic development and deep learning techniques can be utilised for successful AMP design. In this study, a long short-term memory (LSTM) generative model and a bidirectional LSTM classification model were constructed to design short novel AMP sequences with potential antibacterial activity against E. coli. Two versions of the generative model and six versions of the classification model were trained and optimised using Bayesian hyperparameter optimisation. These models were used to generate sets of short novel sequences that were classified as antimicrobial or non-antimicrobial. The validation accuracies of the classification models were 81.6-88.9% and the novel AMPs were classified as antimicrobial with accuracies of 70.6-91.7%. Predicted three-dimensional conformations of selected short AMPs exhibited the alpha-helical structure with amphipathic surfaces. This demonstrates that LSTMs are effective tools for generating novel AMPs against targeted bacteria and could be utilised in the search for new antibiotics leads.

UI MeSH Term Description Entries
D008968 Molecular Conformation The characteristic three-dimensional shape of a molecule. Molecular Configuration,3D Molecular Structure,Configuration, Molecular,Molecular Structure, Three Dimensional,Three Dimensional Molecular Structure,3D Molecular Structures,Configurations, Molecular,Conformation, Molecular,Conformations, Molecular,Molecular Configurations,Molecular Conformations,Molecular Structure, 3D,Molecular Structures, 3D,Structure, 3D Molecular,Structures, 3D Molecular
D004926 Escherichia coli A species of gram-negative, facultatively anaerobic, rod-shaped bacteria (GRAM-NEGATIVE FACULTATIVELY ANAEROBIC RODS) commonly found in the lower part of the intestine of warm-blooded animals. It is usually nonpathogenic, but some strains are known to produce DIARRHEA and pyogenic infections. Pathogenic strains (virotypes) are classified by their specific pathogenic mechanisms such as toxins (ENTEROTOXIGENIC ESCHERICHIA COLI), etc. Alkalescens-Dispar Group,Bacillus coli,Bacterium coli,Bacterium coli commune,Diffusely Adherent Escherichia coli,E coli,EAggEC,Enteroaggregative Escherichia coli,Enterococcus coli,Diffusely Adherent E. coli,Enteroaggregative E. coli,Enteroinvasive E. coli,Enteroinvasive Escherichia coli
D000069550 Machine Learning A type of ARTIFICIAL INTELLIGENCE that enable COMPUTERS to independently initiate and execute LEARNING when exposed to new data. Transfer Learning,Learning, Machine,Learning, Transfer
D000072756 Protein Conformation, alpha-Helical A secondary structure of proteins that is a right-handed helix or coil, where each amino (N-H) group of the peptide backbone contributes a hydrogen bond to the carbonyl(C alpha-Helical Conformation, Protein,alpha-Helical Protein Conformation,alpha-Helical Structures,alpha-Helices,alpha-Helix,Conformation, Protein alpha-Helical,Conformation, alpha-Helical Protein,Conformations, Protein alpha-Helical,Conformations, alpha-Helical Protein,Protein Conformation, alpha Helical,Protein Conformations, alpha-Helical,alpha Helical Conformation, Protein,alpha Helical Protein Conformation,alpha Helical Structures,alpha Helices,alpha Helix,alpha-Helical Conformations, Protein,alpha-Helical Protein Conformations,alpha-Helical Structure
D000077321 Deep Learning Supervised or unsupervised machine learning methods that use multiple layers of data representations generated by nonlinear transformations, instead of individual task-specific ALGORITHMS, to build and train neural network models. Hierarchical Learning,Learning, Deep,Learning, Hierarchical
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D001499 Bayes Theorem A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result. Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D012372 ROC Curve A graphic means for assessing the ability of a screening test to discriminate between healthy and diseased persons; may also be used in other studies, e.g., distinguishing stimuli responses as to a faint stimuli or nonstimuli. ROC Analysis,Receiver Operating Characteristic,Analysis, ROC,Analyses, ROC,Characteristic, Receiver Operating,Characteristics, Receiver Operating,Curve, ROC,Curves, ROC,ROC Analyses,ROC Curves,Receiver Operating Characteristics
D052899 Pore Forming Cytotoxic Proteins Proteins secreted from an organism which form membrane-spanning pores in target cells to destroy them. This is in contrast to PORINS and MEMBRANE TRANSPORT PROTEINS that function within the synthesizing organism and COMPLEMENT immune proteins. These pore forming cytotoxic proteins are a form of primitive cellular defense which are also found in human LYMPHOCYTES.
D019540 Area Under Curve A statistical means of summarizing information from a series of measurements on one individual. It is frequently used in clinical pharmacology where the AUC from serum levels can be interpreted as the total uptake of whatever has been administered. As a plot of the concentration of a drug against time, after a single dose of medicine, producing a standard shape curve, it is a means of comparing the bioavailability of the same drug made by different companies. (From Winslade, Dictionary of Clinical Research, 1992) AUC,Area Under Curves,Curve, Area Under,Curves, Area Under,Under Curve, Area,Under Curves, Area

Related Publications

Christina Wang, and Sam Garlick, and Mire Zloh
August 2018, Bioinformatics (Oxford, England),
Christina Wang, and Sam Garlick, and Mire Zloh
January 2023, Frontiers in bioinformatics,
Christina Wang, and Sam Garlick, and Mire Zloh
October 2020, Bioorganic chemistry,
Christina Wang, and Sam Garlick, and Mire Zloh
December 2023, Journal of chemical information and modeling,
Christina Wang, and Sam Garlick, and Mire Zloh
February 2023, BMC research notes,
Christina Wang, and Sam Garlick, and Mire Zloh
August 2019, Journal of molecular biology,
Christina Wang, and Sam Garlick, and Mire Zloh
November 2022, Antibiotics (Basel, Switzerland),
Christina Wang, and Sam Garlick, and Mire Zloh
June 2022, Digital discovery,
Christina Wang, and Sam Garlick, and Mire Zloh
January 1995, Molecular plant-microbe interactions : MPMI,
Christina Wang, and Sam Garlick, and Mire Zloh
December 2014, Journal of peptide science : an official publication of the European Peptide Society,
Copied contents to your clipboard!