Gradient Projection For Parameter-Efficient Continual Learning
arxiv(2024)
摘要
Catastrophic forgetting poses the primary challenge in the continual
learning. Nowadays, methods based on parameter-efficient tuning (PET) have
demonstrated impressive performance in continual learning. However, these
methods are still confronted with a common problem: fine-tuning on consecutive
distinct tasks can disrupt the existing parameter distribution and lead to
forgetting. Recent progress mainly focused in empirically designing efficient
tuning engineering, lacking investigation of forgetting generation mechanism,
anti-forgetting criteria and providing theoretical support. Additionally, the
unresolved trade-off between learning new content and protecting old knowledge
further complicates these challenges. The gradient projection methodology
restricts gradient updates to the orthogonal direction of the old feature
space, preventing distribution of the parameters from being damaged during
updating and significantly suppressing forgetting. Developing on it, in this
paper, we reformulate Adapter, LoRA, Prefix, and Prompt to continual learning
setting from the perspective of gradient projection, and propose a unified
framework called Parameter Efficient Gradient Projection (PEGP). Based on the
hypothesis that old tasks should have the same results after model updated, we
introduce orthogonal gradient projection into different PET paradigms and
theoretically demonstrate that the orthogonal condition for the gradient can
effectively resist forgetting in PET-based continual methods. Notably, PEGP is
the first unified method to provide an anti-forgetting mechanism with
mathematical demonstration for different tuning paradigms. We extensively
evaluate our method with different backbones on diverse datasets, and
experiments demonstrate its efficiency in reducing forgetting in various
incremental settings.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要