Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
CoRR(2024)
摘要
Merging various task-specific Transformer-based models trained on different
tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both
effective and scalable. Existing methods have primarily focused on seeking a
static optimal solution within the original model parameter space. A notable
challenge is mitigating the interference between parameters of different
models, which can substantially deteriorate performance. In this paper, we
propose to merge most of the parameters while upscaling the MLP of the
Transformer layers to a weight-ensembling mixture of experts (MoE) module,
which can dynamically integrate shared and task-specific knowledge based on the
input, thereby providing a more flexible solution that can adapt to the
specific needs of each instance. Our key insight is that by identifying and
separating shared knowledge and task-specific knowledge, and then dynamically
integrating them, we can mitigate the parameter interference problem to a great
extent. We conduct the conventional multi-task model merging experiments and
evaluate the generalization and robustness of our method. The results
demonstrate the effectiveness of our method and provide a comprehensive
understanding of our method. The code is available at
https://anonymous.4open.science/r/weight-ensembling_MoE-67C9/
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要