谷歌浏览器插件
订阅小程序
在清言上使用

Bridging Worlds in Reinforcement Learning with Model-Advantage

semanticscholar(2020)

引用 1|浏览24
暂无评分
摘要
Despite the breakthroughs achieved by Reinforcement Learning (RL) in recent years, RL agents often fail to perform well in unseen environments. This inability to generalize to new environments prevents their deployment in the real world. To help measure this gap in performance, we introduce model-advantage a quantity similar to the well-known (policy) advantage function. First, we show relationships between the proposed modeladvantage and generalization in RL — using which we provide guarantees on the gap in performance of an agent in new environments. Further, we conduct toy experiments to show that even a sub-optimal policy (learnt with minimal interactions with the target environment) can help predict if a training environment (say, a simulator) helps learn policies that generalize. We then show connections with Model Based RL.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要