Uncertainty Quantification and Exploration for Reinforcement Learning

Operations Research（2023）

引用 0|浏览7

暂无评分

摘要

Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.

查看译文

关键词

uncertainty,exploration,quantification,reinforcement,learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要