Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains.

Haolin Sun,Yves Lespérance

EUMAS(2023)

引用 0|浏览1
暂无评分
摘要
In this paper, we address the challenges of non-Markovian rewards and learning efficiency in deep reinforcement learning (DRL) in continuous action domains by exploiting reward machines (RMs) and counterfactual experiences for reward machines (CRM). RM and CRM were proposed by Toro Icarte et al. A reward machine can decompose a task, convey its high-level structure to an agent, and support certain non-Markovian task specifications. In this paper, we integrate state-of-the-art DRL algorithms with RMs to enhance learning efficiency. Our experimental results demonstrate that Soft Actor-Critic with counterfactual experiences for RMs (SAC-CRM) facilitates faster learning of better policies, while Deep Deterministic Policy Gradient with counterfactual experiences for RMs (DDPG-CRM) is slower, achieves lower rewards, but is more stable. Option-based Hierarchical Reinforcement Learning for reward machines (HRM) and Twin Delayed Deep Deterministic (TD3) with CRM generally underperform compared to SAC-CRM and DDPG-CRM. This work contributes to the ongoing development of more efficient and robust DRL approaches by leveraging the potential of RMs in practical problem-solving scenarios.
更多
查看译文
关键词
deep reinforcement learning,reward machines,continuous action domains,reinforcement learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要