Chrome Extension
WeChat Mini Program
Use on ChatGLM

基于离线模型预训练学习的改进DDPG算法

ZHANG Qian, WANG Hong-ge,NI Liang

Computer Engineering and Design(2022)

Cited 0|Views1
No score
Abstract
针对DDPG(deep deterministic policy gradient)在线训练过程中陷入局部极小值及产生大量试错动作和无效数据的问题,提出一种基于离线模型预训练学习的改进DDPG算法.利用已有数据离线训练对象状态模型和价值奖励模型,提前对DDPG中动作网络和价值网络进行预训练学习,减少DDPG前期工作量并提升在线学习的品质.加入DDQN(double deep Q-Learning network)结构解决Q值估计偏高问题.仿真结果中获取平均累积奖励值提升了9.15%,表明改进算法有效提高了DDPG算法效果.
More
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined