谷歌浏览器插件
订阅小程序
在清言上使用

Comparison of Logp and Logd Correction Models Trained with Public and Proprietary Data Sets.

Journal of computer-aided molecular design(2022)

引用 11|浏览11
暂无评分
摘要
In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.
更多
查看译文
关键词
Partition coefficient,Distribution coefficient,LogP,LogD,ClogP,ClogD,pKa,BioByte,ChEMBL,Machine learning,QSAR models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要