JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning
arxiv(2024)
摘要
While Centralized Training with Decentralized Execution (CTDE) has become the
prevailing paradigm in Multi-Agent Reinforcement Learning (MARL), it may not be
suitable for scenarios in which agents can fully communicate and share
observations with each other. Fully centralized methods, also know as
Centralized Training with Centralized Execution (CTCE) methods, can fully
utilize observations of all the agents by treating the entire system as a
single agent. However, traditional CTCE methods suffer from scalability issues
due to the exponential growth of the joint action space. To address these
challenges, in this paper we propose JointPPO, a CTCE method that uses Proximal
Policy Optimization (PPO) to directly optimize the joint policy of the
multi-agent system. JointPPO decomposes the joint policy into conditional
probabilities, transforming the decision-making process into a sequence
generation task. A Transformer-based joint policy network is constructed,
trained with a PPO loss tailored for the joint policy. JointPPO effectively
handles a large joint action space and extends PPO to multi-agent setting with
theoretical clarity and conciseness. Extensive experiments on the StarCraft
Multi-Agent Challenge (SMAC) testbed demonstrate the superiority of JointPPO
over the strong baselines. Ablation experiments and analyses are conducted to
explores the factors influencing JointPPO's performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要