Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization
CoRR(2023)
摘要
Reinforcement learning (RL) methods with a high replay ratio (RR) and
regularization have gained interest due to their superior sample efficiency.
However, these methods have mainly been developed for dense-reward tasks. In
this paper, we aim to extend these RL methods to sparse-reward goal-conditioned
tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021),
an RL method with a high RR and regularization. To apply REDQ to sparse-reward
goal-conditioned tasks, we make the following modifications to it: (i) using
hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ
with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics
(Plappert et al., 2018), and show that it achieves about $2 \times$ better
sample efficiency than previous state-of-the-art (SoTA) RL methods.
Furthermore, we reconsider the necessity of specific components of REDQ and
simplify it by removing unnecessary ones. The simplified REDQ with our
modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA
methods in 4 Fetch tasks of Robotics.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要