Note On The Quadratic Penalties In Elastic Weight Consolidation
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA(2018)
摘要
Catastrophic forgetting is an undesired phenomenon which occurs when neural networks are trained on different tasks sequentially. Elastic weight consolidation (EWC; ref. 1), published in PNAS, is a novel algorithm designed to safeguard against this. Despite its satisfying simplicity, EWC is remarkably effective.Motivated by Bayesian inference, EWC adds quadratic penalties to the loss function when learning a new task. The purpose of penalties is to approximate the loss surface from previous tasks. The authors derive the penalty for the two-task case and then extrapolate to handling multiple tasks. I believe, however, that the penalties for multiple tasks are applied inconsistently.In ref. 1 a separate penalty is maintained for each task T , centered at θ T ∗ , the value of θ obtained after training on task T . When these penalties are combined (assuming λ T = 1 ), the aggregate penalty is anchored at μ T = ( F A + F B … … [↵][1]1Email: fhuszar@twitter.com. [1]: #xref-corresp-1-1
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要