Bernoulli bandits: an empirical comparison.

ESANN(2015)

引用 23|浏览4
暂无评分
摘要
An empirical comparative study is made of a sample of action selection policies on a test suite of the Bernoulli multi-armed bandit with Κ = 10, Κ = 20 and Κ = 50 arms, each for which we consider several success probabilities. For such problems the rewards are either Success or Failure with unknown success rate. Our study focusses on e-greedy, UCB1-Tuned, Thompson sampling, the Gittinu0027s index policy, the knowledge gradient and a new hybrid algorithm. The last two are not wellknown in computer science. In this paper, we examine policy dependence on the horizon and report results which suggest that a new hybridized procedure based on Thompsons sampling improves on its regret.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要