Least-Squares Temporal Difference Learning

ICML(1999)

引用 327|浏览582
暂无评分
摘要
TD is a popular family of algorithms for approximate policy evalua- tion in large MDPs. TD works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and , the Least-Squares TD (LSTD) al- gorithm of Bradtke and Barto (5) eliminates all stepsize parameters and improves data efficiency. This paper extends Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from to arbitrary values of ; at the extreme of , the resulting algorithm is shown to be a practical formulation of super- vised linear regression. Third, it presents a novel, intuitive interpretation of LSTD as a model-based reinforcement learning technique.
更多
查看译文
关键词
least-squares temporal difference learning,linear regression,value function,temporal difference learning,reinforcement learning,least square
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要