Deep Imitation Learning for Optimal Trajectory Planning and Initial Condition Optimization for an Unstable Dynamic System

Bo-Hsun Chen,Pei-Chun Lin

ADVANCED INTELLIGENT SYSTEMS(2024)

引用 0|浏览0
暂无评分
摘要
In this article, an innovative offline deep imitation learning algorithm for optimal trajectory planning is proposed. While many state-of-the-art works achieved optimal trajectory planning, their systems were stable or quasistable, and their approaches rarely optimized the system's initial conditions (ICs). Here, a new unstable dynamic system task called "internal sliding object stabilization control" is proposed, modeled, and solved by deep imitation learning. Given the system's ICs, the neural networks (NNs) can imitate the iterative linear quadratic regulator (iLQR), generate optimal trajectories, and compute faster. A proportional-integral-derivative (PID) controller is used to track the unstable trajectories. Leveraging on the gradients of NNs, it can optimize the system's ICs, avoid obstacles stepwise, and ensure the worst bounds of NNs for safety. Subsequently, thorough simulations are conducted, including comparing the iLQR and PID controllers in the task, optimizing the system's different ICs by gradient descent, and finding the worst bound of the performance by gradient ascent. Results show that the proposed algorithm achieves considerably improved performance. Finally, experiments are conducted with a real manipulator to compare the proposed structure with the original iLQR. Results indicate that the proposed algorithm resembles the iLQR well. Program code and experiment results are in https://github.com/DanielYamChen/ISOSC.git. A deep imitation learning structure is proposed for optimal trajectory planning of an unstable dynamic system. Given the system's initial conditions, the neural networks (NNs) imitate the iterative linear quadratic regulator to generate near-optimal trajectories. Leveraging on NN gradients, optimizing the system's initial conditions, avoiding obstacles, and finding the NN worst bounds are achieved in simulations, with empirical experiments conducted.image (c) 2023 WILEY-VCH GmbH
更多
查看译文
关键词
deep imitation learning,gradient descent,obstacle avoidance,optimal control,optimal trajectory planning,safe machine learning,trajectory optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要