Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
arxiv(2024)
摘要
Offline reinforcement learning (RL), which seeks to learn an optimal policy
using offline data, has garnered significant interest due to its potential in
critical applications where online data collection is infeasible or expensive.
This work explores the benefit of federated learning for offline RL, aiming at
collaboratively leveraging offline datasets at multiple agents. Focusing on
finite-horizon episodic tabular Markov decision processes (MDPs), we design
FedLCB-Q, a variant of the popular model-free Q-learning algorithm tailored for
federated offline RL. FedLCB-Q updates local Q-functions at agents with novel
learning rate schedules and aggregates them at a central server using
importance averaging and a carefully designed pessimistic penalty term. Our
sample complexity analysis reveals that, with appropriately chosen parameters
and synchronization schedules, FedLCB-Q achieves linear speedup in terms of the
number of agents without requiring high-quality datasets at individual agents,
as long as the local datasets collectively cover the state-action space visited
by the optimal policy, highlighting the power of collaboration in the federated
setting. In fact, the sample complexity almost matches that of the single-agent
counterpart, as if all the data are stored at a central location, up to
polynomial factors of the horizon length. Furthermore, FedLCB-Q is
communication-efficient, where the number of communication rounds is only
linear with respect to the horizon length up to logarithmic factors.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要