Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model
CoRR(2024)
摘要
Recent advancements in foundation models have yielded impressive performance
across a wide range of tasks. Meanwhile, for specific applications,
practitioners have been developing specialized application models. To enjoy the
benefits of both kinds of models, one natural path is to transfer the knowledge
in foundation models into specialized application models, which are generally
more efficient for serving. Techniques from knowledge distillation may be
applied here, where the application model learns to mimic the foundation model.
However, specialized application models and foundation models have substantial
gaps in capacity, employing distinct architectures, using different input
features from different modalities, and being optimized on different
distributions. These differences in model characteristics lead to significant
challenges for distillation methods. In this work, we propose creating a
teaching committee comprising both foundation model teachers and complementary
teachers. Complementary teachers possess model characteristics akin to the
student's, aiming to bridge the gap between the foundation model and
specialized application models for a smoother knowledge transfer. Further, to
accommodate the dissimilarity among the teachers in the committee, we introduce
DiverseDistill, which allows the student to understand the expertise of each
teacher and extract task knowledge. Our evaluations demonstrate that adding
complementary teachers enhances student performance. Finally, DiverseDistill
consistently outperforms baseline distillation methods, regardless of the
teacher choices, resulting in significantly improved student performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要