An efficient algorithm for learning with semi-bandit feedback

ALGORITHMIC LEARNING THEORY (ALT 2013)（2013）

引用 91|浏览278

暂无评分

摘要

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m root dT log d). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m(3/2)root T log d), gaining a factor of root d/m over previous bounds for this algorithm.

查看译文

关键词

Follow-the-perturbed-leader,bandit problems,online learning,combinatorial optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要