Planning and Learning in Partially Observable Systems via Filter Stability

PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023(2023)

引用 1|浏览26
暂无评分
摘要
Partially Observable Markov Decision Processes (POMDPs) are an important model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume access to a planning oracle that computes an optimal policy when the parameters are known, even though this problem is known to be computationally hard. The major obstruction is the Curse of History, which arises because optimal policies for POMDPs may depend on the entire observation history thus far. In this work, we revisit the planning problem and ask: Are there natural and well-motivated assumptions that avoid the Curse of History in POMDP planning (and beyond)? We assume one-step observability, which stipulates that well-separated distributions on states lead towell-separated distributions on observations. Our main technical result is a new quantitative bound for filter stability in observable Hidden Markov Models (HMMs) and POMDPs - i.e. the rate at which the Bayes filter for the latent state forgets its initialization. We give the following algorithmic applications: First, a quasipolynomial-time algorithm for planning in one-step observable POMDPs and a matching computational lower bound under the Exponential Time Hypothesis. Crucially, we require no assumptions on the transition dynamics of the POMDP. Second, a quasipolynomial-time algorithm for improper learning of overcomplete HMMs, which does not require full-rank transitions; full-rankness is violated, for instance, when the number of latent states varies over time. Instead we assume multi-step observability, a generalization of observability which allows observations to be informative in aggregate. Third, a quasipolynomial-time algorithm for computing approximate coarse correlated equilibria in one-step observable Partially Observable Markov Games (POMGs). Thus we show that observability gives a blueprint for circumventing computational intractability in a variety of settings with partial observations, including planning, learning and computing equilibria.
更多
查看译文
关键词
Partially observable Markov decision processes,filter stability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要