A distributional view on multi objective policy optimization

Abbas Abdolmaleki,Sandy Huang,Leonard Hasenclever,Michael Neunert,Martina Zambelli,Murilo Martins,Francis Song,Nicolas Heess,Raia Hadsell,Martin Riedmiller

international conference on machine learning（2020）

引用 69|浏览353

暂无评分

摘要

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.

查看译文

关键词

optimization,distributional view,policy,multi-objective

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要