Generalization within in silico screening
arxiv(2023)
摘要
In silico screening uses predictive models to select a batch of compounds
with favorable properties from a library for experimental validation. Unlike
conventional learning paradigms, success in this context is measured by the
performance of the predictive model on the selected subset of compounds rather
than the entire set of predictions. By extending learning theory, we show that
the selectivity of the selection policy can significantly impact
generalization, with a higher risk of errors occurring when exclusively
selecting predicted positives and when targeting rare properties. Our analysis
suggests a way to mitigate these challenges. We show that generalization can be
markedly enhanced when considering a model's ability to predict the fraction of
desired outcomes in a batch. This is promising, as the primary aim of screening
is not necessarily to pinpoint the label of each compound individually, but
rather to assemble a batch enriched for desirable compounds. Our theoretical
insights are empirically validated across diverse tasks, architectures, and
screening scenarios, underscoring their applicability.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要