Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction. 2020

Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Advances in deep neural network (DNN)-based molecular property prediction have recently led to the development of models of remarkable accuracy and generalization ability, with graph convolutional neural networks (GCNNs) reporting state-of-the-art performance for this task. However, some challenges remain, and one of the most important that needs to be fully addressed concerns uncertainty quantification. DNN performance is affected by the volume and the quality of the training samples. Therefore, establishing when and to what extent a prediction can be considered reliable is just as important as outputting accurate predictions, especially when out-of-domain molecules are targeted. Recently, several methods to account for uncertainty in DNNs have been proposed, most of which are based on approximate Bayesian inference. Among these, only a few scale to the large data sets required in applications. Evaluating and comparing these methods has recently attracted great interest, but results are generally fragmented and absent for molecular property prediction. In this paper, we quantitatively compare scalable techniques for uncertainty estimation in GCNNs. We introduce a set of quantitative criteria to capture different uncertainty aspects and then use these criteria to compare MC-dropout, Deep Ensembles, and bootstrapping, both theoretically in a unified framework that separates aleatoric/epistemic uncertainty and experimentally on public data sets. Our experiments quantify the performance of the different uncertainty estimation methods and their impact on uncertainty-related error reduction. Our findings indicate that Deep Ensembles and bootstrapping consistently outperform MC-dropout, with different context-specific pros and cons. Our analysis leads to a better understanding of the role of aleatoric/epistemic uncertainty, also in relation to the target data set features, and highlights the challenge posed by out-of-domain uncertainty.

UI MeSH Term Description Entries
D000077321 Deep Learning Supervised or unsupervised machine learning methods that use multiple layers of data representations generated by nonlinear transformations, instead of individual task-specific ALGORITHMS, to build and train neural network models. Hierarchical Learning,Learning, Deep,Learning, Hierarchical
D001499 Bayes Theorem A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result. Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D016571 Neural Networks, Computer A computer architecture, implementable in either hardware or software, modeled after biological neural networks. Like the biological system in which the processing capability is a result of the interconnection strengths between arrays of nonlinear processing nodes, computerized neural networks, often called perceptrons or multilayer connectionist models, consist of neuron-like units. A homogeneous group of units makes up a layer. These networks are good at pattern recognition. They are adaptive, performing tasks by example, and thus are better for decision-making than are linear learning machines or cluster analysis. They do not require explicit programming. Computational Neural Networks,Connectionist Models,Models, Neural Network,Neural Network Models,Neural Networks (Computer),Perceptrons,Computational Neural Network,Computer Neural Network,Computer Neural Networks,Connectionist Model,Model, Connectionist,Model, Neural Network,Models, Connectionist,Network Model, Neural,Network Models, Neural,Network, Computational Neural,Network, Computer Neural,Network, Neural (Computer),Networks, Computational Neural,Networks, Computer Neural,Networks, Neural (Computer),Neural Network (Computer),Neural Network Model,Neural Network, Computational,Neural Network, Computer,Neural Networks, Computational,Perceptron
D035501 Uncertainty The condition in which reasonable knowledge regarding risks, benefits, or the future is not available.

Related Publications

Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
February 2023, Journal of cheminformatics,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
November 2023, Journal of cheminformatics,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
December 2022, Drug discovery today,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
August 2021, ACS central science,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
December 2023, Nature computational science,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
July 2017, Journal of chemical theory and computation,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
September 2023, Computers in biology and medicine,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
November 2022, Briefings in bioinformatics,
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
May 2024, Bioinformatics (Oxford, England),
Gabriele Scalia, and Colin A Grambow, and Barbara Pernici, and Yi-Pei Li, and William H Green
August 2020, Journal of chemical information and modeling,
Copied contents to your clipboard!