Conformal Off-Policy Prediction in Contextual Bandits

Muhammad Faaiz Taufiq,Jean-Francois Ton,Rob Cornish,Yee Whye Teh,Arnaud Doucet

NeurIPS 2022（2022）

引用 10|浏览65

暂无评分

摘要

Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees. However, in many applications, the expectation may not be the best measure of performance as it does not capture the variability of the outcome. In addition, particularly in safety-critical settings, stronger guarantees than asymptotic correctness may be required. To address these limitations, we consider a novel application of conformal prediction to contextual bandits. Given data collected under a behavioral policy, we propose \emph{conformal off-policy prediction} (COPP), which can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup, and empirically demonstrate the utility of COPP compared with existing methods on synthetic and real-world data.

查看译文

关键词

conformal prediction,contextual bandits,uncertainty quantification,robust ML

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要