Comparison Of Scaling Methods To Obtain Calibrated Probabilities Of Activity For Protein-Ligand Predictions

Lewis Mervin,Avid M Afzal,Ola Engkvist,Andreas Bender

JOURNAL OF CHEMICAL INFORMATION AND MODELING（2020）

引用 14|浏览8

暂无评分

摘要

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into a probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely, Platt scaling (PS), isotonic regression (IR), and Venn-ABERS predictors (VA), in calibrating prediction scores obtained from ligand-target prediction comprising the Naive Bayes, support vector machines, and random forest (RF) algorithms. Calibration quality was assessed on bioactivity data available at AstraZeneca for 40 million data points (compound-target pairs) across 2112 targets and performance was assessed using stratified shuffle split (SSS) and leave 20% of scaffolds out (L20SO) validation. VA achieved the best calibration performances across all machine learning algorithms and cross validation methods tested and also the lowest (best) Brier score loss (mean squared difference between the outputted probability estimates assigned to a compound and the actual outcome). In comparison, the PS and IR methods can actually degrade the assigned probability estimates, particularly for the RF for SSS and during L20SO. Sphere exclusion, a method to sample additional (putative) inactive compounds, was shown to inflate the overall Brier score loss performance, through the artificial requirement for inactive molecules to be dissimilar to active compounds, but was shown to result in overconfident estimators. VA was able to successfully calibrate the probability estimates for even small calibration sets. The multiprobability values (lower and upper probability boundary intervals) were shown to produce large discordance for test set molecules that are neither very similar nor very dissimilar to the active training set, which were hence difficult to predict, suggesting that multiprobability discordance can be used as an estimate for target prediction uncertainty. Overall, we were able to show in this work that VA scaling of target prediction models is able to improve probability estimates in all testing instances and is currently being applied for in-house approaches.

查看译文

关键词

protein–ligand predictions,scaling methods,calibrated probabilities

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要