Note On The Quadratic Penalties In Elastic Weight Consolidation

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA(2018)

引用 139|浏览69
暂无评分
摘要
Catastrophic forgetting is an undesired phenomenon which occurs when neural networks are trained on different tasks sequentially. Elastic weight consolidation (EWC; ref. 1), published in PNAS, is a novel algorithm designed to safeguard against this. Despite its satisfying simplicity, EWC is remarkably effective.Motivated by Bayesian inference, EWC adds quadratic penalties to the loss function when learning a new task. The purpose of penalties is to approximate the loss surface from previous tasks. The authors derive the penalty for the two-task case and then extrapolate to handling multiple tasks. I believe, however, that the penalties for multiple tasks are applied inconsistently.In ref. 1 a separate penalty is maintained for each task T , centered at θ T ∗ , the value of θ obtained after training on task T . When these penalties are combined (assuming λ T = 1 ), the aggregate penalty is anchored at μ T = ( F A + F B … … [↵][1]1Email: fhuszar@twitter.com. [1]: #xref-corresp-1-1
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要