Finite Sample Analyses for TD(0) With Function Approximation.

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE(2018)

引用 170|浏览57
暂无评分
摘要
TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Works that managed to obtain convergence rates for online Temporal Difference (TD) methods analyzed somewhat modified versions of them that include projections and step-size dependent on unknown problem parameters. Our analysis obviates these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. Both are based on relatively unknown, recently developed stochastic approximation techniques.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要