Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator
CoRR(2024)
摘要
Risk-sensitive linear quadratic regulator is one of the most fundamental
problems in risk-sensitive optimal control. In this paper, we study online
adaptive control of risk-sensitive linear quadratic regulator in the finite
horizon episodic setting. We propose a simple least-squares greedy algorithm
and show that it achieves 𝒪(log N) regret under a
specific identifiability assumption, where N is the total number of episodes.
If the identifiability assumption is not satisfied, we propose incorporating
exploration noise into the least-squares-based algorithm, resulting in an
algorithm with 𝒪(√(N)) regret. To our best
knowledge, this is the first set of regret bounds for episodic risk-sensitive
linear quadratic regulator. Our proof relies on perturbation analysis of
less-standard Riccati equations for risk-sensitive linear quadratic control,
and a delicate analysis of the loss in the risk-sensitive performance criterion
due to applying the suboptimal controller in the online learning process.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要