The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Operations Research(2023)

引用 0|浏览0
暂无评分
摘要
We consider “incentivized exploration”: a version of multiarmed bandits where the choice of arms is controlled by self-interested agents. The algorithm can only issue recommendations and needs to incentivize the agents to explore, even though they prefer to exploit. The algorithm controls the flow of information and relies on information asymmetry to create incentives. This line of work, motivated by misaligned incentives in recommendation systems, has received considerable attention in the “economics and computation” community. We focus on the “price of incentives”: the loss in performance, broadly construed, incurred for the sake of incentive compatibility. We prove that Thompson sampling, a standard bandit algorithm, is incentive compatible if initialized with sufficiently many data points. The performance loss because of incentives is, therefore, limited to the initial rounds when these data points are collected. The problem is largely reduced to that of sample complexity. How many rounds are needed to collect even one sample of each arm? We zoom in on this question, which is perhaps the most basic question one could ask about incentivized exploration. We characterize the dependence on agents' beliefs and the number of arms (which was essentially ignored in prior work), providing matching upper and lower bounds. Typically, the optimal sample complexity is polynomial in the number of arms and exponential in the “strength of beliefs.”
更多
查看译文
关键词
incentivizing exploration,thompson sampling,complexity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要