Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning
CoRR(2024)
摘要
Applying diffusion models in reinforcement learning for long-term planning
has gained much attention recently. Several diffusion-based methods have
successfully leveraged the modeling capabilities of diffusion for arbitrary
distributions. These methods generate subsequent trajectories for planning and
have demonstrated significant improvement. However, these methods are limited
by their plain base distributions and their overlooking of the diversity of
samples, in which different states have different returns. They simply leverage
diffusion to learn the distribution of offline dataset, generate the
trajectories whose states share the same distribution with the offline dataset.
As a result, the probability of these models reaching the high-return states is
largely dependent on the dataset distribution. Even equipped with the guidance
model, the performance is still suppressed. To address these limitations, in
this paper, we propose a novel method called CDiffuser, which devises a return
contrast mechanism to pull the states in generated trajectories towards
high-return states while pushing them away from low-return states to improve
the base distribution. Experiments on 14 commonly used D4RL benchmarks
demonstrate the effectiveness of our proposed method. Our code is publicly
available at https://anonymous.4open.science/r/ContrastiveDiffuser.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要