Efficient policy evaluation by matrix sketching

FRONTIERS OF COMPUTER SCIENCE(2022)

引用 1|浏览40
暂无评分
摘要
In the reinforcement learning, policy evaluation aims to predict long-term values of a state under a certain policy. Since high-dimensional representations become more and more common in the reinforcement learning, how to reduce the computational cost becomes a significant problem to the policy evaluation. Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal difference (TD) algorithms and quasi-Newton temporal difference algorithms. Among these sketching methods, the truncated incremental SVD shows better performance because it is stable and efficient. However, the convergence properties of the incremental SVD is still open. In this paper, we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case. Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically. Moreover, we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms. The experimental results verify the correctness and effectiveness of our methods.
更多
查看译文
关键词
temporal difference learning,policy evaluation,matrix sketching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要