LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views
CoRR(2024)
摘要
Fine-tuning is becoming widely used for leveraging the power of pre-trained
foundation models in new downstream tasks. While there are many successes of
fine-tuning on various tasks, recent studies have observed challenges in the
generalization of fine-tuned models to unseen distributions (i.e.,
out-of-distribution; OOD). To improve OOD generalization, some previous studies
identify the limitations of fine-tuning data and regulate fine-tuning to
preserve the general representation learned from pre-training data. However,
potential limitations in the pre-training data and models are often ignored. In
this paper, we contend that overly relying on the pre-trained representation
may hinder fine-tuning from learning essential representations for downstream
tasks and thus hurt its OOD generalization. It can be especially catastrophic
when new tasks are from different (sub)domains compared to pre-training data.
To address the issues in both pre-training and fine-tuning data, we propose a
novel generalizable fine-tuning method LEVI, where the pre-trained model is
adaptively ensembled layer-wise with a small task-specific model, while
preserving training and inference efficiencies. By combining two complementing
models, LEVI effectively suppresses problematic features in both the
fine-tuning data and pre-trained model and preserves useful features for new
tasks. Broad experiments with large language and vision models show that LEVI
greatly improves fine-tuning generalization via emphasizing different views
from fine-tuning data and pre-trained features.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要