'I pick you choose': Joint human-algorithm decision making in multi-armed bandits

ICLR 2023(2023)

引用 0|浏览14
暂无评分
摘要
Online learning in multi-armed bandits has been a rich area of research for decades, resulting in numerous \enquote{no-regret} algorithms that efficiently learn the arm with highest expected reward. However, in many settings the final decision of which arm to pull isn't under the control of the algorithm itself. For example, a driving app typically suggests a subset of routes (arms) to the driver, who ultimately makes the final choice about which to select. Typically, the human also wishes to learn the optimal arm based on historical reward information, but decides which arm to pull based on a potentially different objective function, such as being more or less myopic about exploiting near-term rewards. In this paper, we show when this joint human-algorithm system can achieve good performance. Specifically, we explore multiple possible frameworks for human objectives and give theoretical regret bounds for regret. Finally, we include experimental results exploring how regret varies with the human decision-maker's objective, as well as the number of arms presented.
更多
查看译文
关键词
human algorithm collaboration,multi-armed bandits,complementarity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要