Utile Context Tree Weighting

neural information processing systems(2017)

引用 22|浏览37
暂无评分
摘要
Reinforcement learning (RL) in partially observable settings is challenging because the agent’s immediate observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observation histories due to sensor noise. This paper proposes utile context tree weighting (UCTW), a model-learning method that addresses these limitations. UCTW dynamically expands a suffix tree while ensuring that the total size of the model, but not its depth, remains bounded. We show that UCTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply UCTW to model-based RL, showing that, on tasks that require memory of past observations, UCTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要