On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks
CoRR(2024)
摘要
This paper presents a study of robust policy networks in deep reinforcement
learning. We investigate the benefits of policy parameterizations that
naturally satisfy constraints on their Lipschitz bound, analyzing their
empirical performance and robustness on two representative problems: pendulum
swing-up and Atari Pong. We illustrate that policy networks with small
Lipschitz bounds are significantly more robust to disturbances, random noise,
and targeted adversarial attacks than unconstrained policies composed of
vanilla multi-layer perceptrons or convolutional neural networks. Moreover, we
find that choosing a policy parameterization with a non-conservative Lipschitz
bound and an expressive, nonlinear layer architecture gives the user much finer
control over the performance-robustness trade-off than existing
state-of-the-art methods based on spectral normalization.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要