Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Anthony Thomas,Eccles Tom,Tacchetti Andrea,Kramár János,Gemp Ian,Hudson Thomas C.,Porcel Nicolas,Lanctot Marc,Pérolat Julien,Everett Richard,Singh Satinder,Graepel Thore,Bachrach Yoram

NIPS 2020（2020）

引用 46|浏览41729

暂无评分

摘要

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

查看译文

关键词

policy,learning,no-press

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要