Adversarial Combinatorial Bandits with Switching Costs
IEEE Transactions on Information Theory(2024)
摘要
We study the problem of adversarial combinatorial bandit with a switching
cost λ for a switch of each selected arm in each round, considering
both the bandit feedback and semi-bandit feedback settings. In the oblivious
adversarial case with K base arms and time horizon T, we derive lower
bounds for the minimax regret and design algorithms to approach them. To prove
these lower bounds, we design stochastic loss sequences for both feedback
settings, building on an idea from previous work in Dekel et al. (2014). The
lower bound for bandit feedback is Ω̃( (λ
K)^1/3 (TI)^2/3) while that for semi-bandit feedback
is Ω̃( (λ K I)^1/3 T^2/3)
where I is the number of base arms in the combinatorial arm played in each
round. To approach these lower bounds, we design algorithms that operate in
batches by dividing the time horizon into batches to restrict the number of
switches between actions. For the bandit feedback setting, where only the total
loss of the combinatorial arm is observed, we introduce the Batched-Exp2
algorithm which achieves a regret upper bound of Õ((λ
K)^1/3T^2/3I^4/3) as T tends to infinity.
In the semi-bandit feedback setting, where all losses for the combinatorial arm
are observed, we propose the Batched-BROAD algorithm which achieves a regret
upper bound of Õ( (λ K)^1/3
(TI)^2/3).
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要