PPO Dash: Improving Generalization in Deep Reinforcement Learning

arxiv(2019)

引用 0|浏览4
暂无评分
摘要
Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.
更多
查看译文
关键词
deep reinforcement learning,generalization,ppo dash
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要