Disentangling the Causes of Plasticity Loss in Neural Networks
CoRR(2024)
摘要
Underpinning the past decades of work on the design, initialization, and
optimization of neural networks is a seemingly innocuous assumption: that the
network is trained on a stationary data distribution. In settings
where this assumption is violated, e.g. deep reinforcement learning, learning
algorithms become unstable and brittle with respect to hyperparameters and even
random seeds. One factor driving this instability is the loss of plasticity,
meaning that updating the network's predictions in response to new information
becomes more difficult as training progresses. While many recent works provide
analyses and partial solutions to this phenomenon, a fundamental question
remains unanswered: to what extent do known mechanisms of plasticity loss
overlap, and how can mitigation strategies be combined to best maintain the
trainability of a network? This paper addresses these questions, showing that
loss of plasticity can be decomposed into multiple independent mechanisms and
that, while intervening on any single mechanism is insufficient to avoid the
loss of plasticity in all cases, intervening on multiple mechanisms in
conjunction results in highly robust learning algorithms. We show that a
combination of layer normalization and weight decay is highly effective at
maintaining plasticity in a variety of synthetic nonstationary learning tasks,
and further demonstrate its effectiveness on naturally arising
nonstationarities, including reinforcement learning in the Arcade Learning
Environment.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要