Distributional Reward Decomposition for Reinforcement Learning

Zichuan Lin,Li Zhao,Derek C Yang,Tao Qin,Tie-Yan Liu,Guangwen Yang

NeurIPS（2019）

点击这里查看nips2019的所有论文

引用 23|浏览145

暂无评分

摘要

Many reinforcement learning (RL) tasks have specific properties that can be lever-aged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.

查看译文

关键词

reinforcement learning,specific properties

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要