Bernoulli bandits: an empirical comparison.

Nixon K. Ronoh, Reuben Odoyo,Edna Milgo,Madalina M. Drugan,Bernard Manderick

ESANN（2015）

引用 23|浏览4

暂无评分

摘要

An empirical comparative study is made of a sample of action selection policies on a test suite of the Bernoulli multi-armed bandit with Κ = 10, Κ = 20 and Κ = 50 arms, each for which we consider several success probabilities. For such problems the rewards are either Success or Failure with unknown success rate. Our study focusses on e-greedy, UCB1-Tuned, Thompson sampling, the Gittinu0027s index policy, the knowledge gradient and a new hybrid algorithm. The last two are not wellknown in computer science. In this paper, we examine policy dependence on the horizon and report results which suggest that a new hybridized procedure based on Thompsons sampling improves on its regret.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要