Constrained Policy Optimization via Bayesian World Models

International Conference on Learning Representations (ICLR)(2022)

引用 33|浏览51
暂无评分
摘要
Improving sample-efficiency and safety are crucial challenges when deploying reinforcement learning in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. Our approach utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper bounds on the safety constraints. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
更多
查看译文
关键词
Reinforcement learning,Constrained Markov decision processes,Constrained policy optimization,Bayesian model-based RL
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要