Inverse Value Iteration and $Q$ -Learning: Algorithms, Stability, and Robustness

Bosen Lian,Wenqian Xue,Frank L. Lewis,Ali Davoudi

IEEE Transactions on Neural Networks and Learning Systems（2024）

引用 0|浏览9

暂无评分

摘要

This article proposes a data-driven model-free inverse $Q$ -learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent’s trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This article first poses a model-based inverse value iteration scheme using the agent’s system dynamics. Then, an online model-free inverse $Q$ -learning algorithm is developed to recover the agent’s cost function only using the demonstrated trajectories. It is more efficient than the existing inverse reinforcement learning (RL) algorithms as it avoids the repetitive RL in inner loops. The proposed algorithms do not need initial stabilizing control policies and solve for unbiased solutions. The proposed algorithm’s asymptotic stability, convergence, and robustness are guaranteed. Theoretical analysis and simulation examples show the effectiveness and advantages of the proposed algorithms.

查看译文

关键词

Heuristic algorithms,Trajectory,Optimal control,Cost function,Q-learning,Mathematical models,Robustness,Convergence,inverse optimal control (IOC),inverse reinforcement learning (RL),model-free,robustness,stability

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要