Traditional Machine and Deep Learning for Predicting Toxicity Endpoints. 2022

Ulf Norinder
Department of Computer and Systems Sciences, Stockholm University, 164 07 Kista, Sweden.

Molecular structure property modeling is an increasingly important tool for predicting compounds with desired properties due to the expensive and resource-intensive nature and the problem of toxicity-related attrition in late phases during drug discovery and development. Lately, the interest for applying deep learning techniques has increased considerably. This investigation compares the traditional physico-chemical descriptor and machine learning-based approaches through autoencoder generated descriptors to two different descriptor-free, Simplified Molecular Input Line Entry System (SMILES) based, deep learning architectures of Bidirectional Encoder Representations from Transformers (BERT) type using the Mondrian aggregated conformal prediction method as overarching framework. The results show for the binary CATMoS non-toxic and very-toxic datasets that for the former, almost equally balanced, dataset all methods perform equally well while for the latter dataset, with an 11-fold difference between the two classes, the MolBERT model based on a large pre-trained network performs somewhat better compared to the rest with high efficiency for both classes (0.93-0.94) as well as high values for sensitivity, specificity and balanced accuracy (0.86-0.87). The descriptor-free, SMILES-based, deep learning BERT architectures seem capable of producing well-balanced predictive models with defined applicability domains. This work also demonstrates that the class imbalance problem is gracefully handled through the use of Mondrian conformal prediction without the use of over- and/or under-sampling, weighting of classes or cost-sensitive methods.

UI MeSH Term Description Entries
D000069550 Machine Learning A type of ARTIFICIAL INTELLIGENCE that enable COMPUTERS to independently initiate and execute LEARNING when exposed to new data. Transfer Learning,Learning, Machine,Learning, Transfer
D000077321 Deep Learning Supervised or unsupervised machine learning methods that use multiple layers of data representations generated by nonlinear transformations, instead of individual task-specific ALGORITHMS, to build and train neural network models. Hierarchical Learning,Learning, Deep,Learning, Hierarchical
D015394 Molecular Structure The location of the atoms, groups or ions relative to one another in a molecule, as well as the number, type and location of covalent bonds. Structure, Molecular,Molecular Structures,Structures, Molecular
D055808 Drug Discovery The process of finding chemicals for potential therapeutic use. Drug Prospecting,Discovery, Drug,Prospecting, Drug

Related Publications

Ulf Norinder
January 2022, Methods in molecular biology (Clifton, N.J.),
Ulf Norinder
January 2021, Computational intelligence and neuroscience,
Ulf Norinder
November 2023, Experimental biology and medicine (Maywood, N.J.),
Ulf Norinder
November 2025, Toxicological sciences : an official journal of the Society of Toxicology,
Ulf Norinder
December 2023, Journal of integrative bioinformatics,
Copied contents to your clipboard!