Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs
CoRR(2024)
摘要
Identifying how much a model p_θ(Y|X) knows about the
stochastic real-world process p(Y|X) it was trained on is important to ensure
it avoids producing incorrect or "hallucinated" answers or taking unsafe
actions. But this is difficult for generative models because probabilistic
predictions do not distinguish between per-response noise (aleatoric
uncertainty) and lack of knowledge about the process (epistemic uncertainty),
and existing epistemic uncertainty quantification techniques tend to be
overconfident when the model underfits. We propose a general strategy for
teaching a model to both approximate p(Y|X) and also estimate the remaining
gaps between p_θ(Y|X) and p(Y|X): train it to predict
pairs of independent responses drawn from the true conditional distribution,
allow it to "cheat" by observing one response while predicting the other, then
measure how much it cheats. Remarkably, we prove that being good at cheating
(i.e. cheating whenever it improves your prediction) is equivalent to being
second-order calibrated, a principled extension of ordinary calibration that
allows us to construct provably-correct frequentist confidence intervals for
p(Y|X) and detect incorrect responses with high probability. We demonstrate
empirically that our approach accurately estimates how much models don't know
across ambiguous image classification, (synthetic) language modeling, and
partially-observable navigation tasks, outperforming existing techniques.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要