Adaptive Discretization in Online Reinforcement Learning

Operations Research(2023)

引用 0|浏览0
暂无评分
摘要
Adaptive Discretization in Reinforcement Learning Performance guarantees for RL algorithms are typically for worst case instances, which are pathological by design and not observed in meaningful applications. Moreover, many domains (such as computer systems and networking applications) have large state-action spaces and require algorithms to execute with low latency. This phenomenon highlights a trifecta of goals for practical RL algorithms: low sample, storage, and computational complexity. In this work, we develop an algorithmic framework for nonparametric RL with data-driven adaptive discretization. Our framework has provably better sample, storage, and computational complexity than uniform discretization or kernel regression methods. Moreover, we highlight how the performance guarantees are min-max optimal with respect to a novel instance-specific complexity measure that captures structure in facility location and newsvendor models. Discretization-based approaches to solving online reinforcement learning problems are studied extensively on applications such as resource allocation and cache management. The two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. There are several experimental results investigating heuristic approaches to these questions but little theoretical treatment. In this paper, we provide a unified theoretical analysis of model-free and model-based, tree-based adaptive hierarchical partitioning methods for online reinforcement learning. We show how our algorithms take advantage of inherent problem structure by providing guarantees that scale with respect to the “zooming” instead of the ambient dimension, an instance-dependent quantity measuring the benignness of the optimal Qh⋆ function. Many applications in computing systems and operations research require algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden for policy evaluation and training. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets. Funding: This work is supported by funding from the National Science Foundation [Grants ECCS-1847393, DMS-1839346, CCF-1948256, and CNS-1955997] and the Army Research Laboratory [Grant W911NF-17-1-0094]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2396 .
更多
查看译文
关键词
Machine Learning and Data Science,reinforcement learning,metric spaces,adaptive discretization,online learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要