Offline Policy Evaluation in Large Action Spaces via Outcome-Oriented Action Grouping

WWW 2023(2023)

引用 4|浏览35
暂无评分
摘要
Offline policy evaluation (OPE) aims to accurately estimate the performance of a hypothetical policy using only historical data, which has drawn increasing attention in a wide range of applications including recommender systems and personalized medicine. With the presence of rising granularity of consumer data, many industries started exploring larger action candidate spaces to support more precise personalized action. While inverse propensity score (IPS) is a standard OPE estimator, it suffers from more severe variance issues with increasing action spaces. To address this issue, we theoretically prove that the estimation variance can be reduced by merging actions into groups while the distinction among these action effects on the outcome can induce extra bias. Motivated by these, we propose a novel IPS estimator with outcome-oriented action Grouping (GroupIPS), which leverages a Lipschitz regularized network to measure the distance of action effects in the embedding space and merges nearest action neighbors. This strategy enables more robust estimation by achieving smaller variances while inducing minor additional bias. Empirically, extensive experiments on both synthetic and real world datasets demonstrate the effectiveness of our proposed method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要