Learning to Project for Cross-Task Knowledge Distillation
arxiv(2024)
摘要
Traditional knowledge distillation (KD) relies on a proficient teacher
trained on the target task, which is not always available. In this setting,
cross-task distillation can be used, enabling the use of any teacher model
trained on a different task. However, many KD methods prove ineffective when
applied to this cross-task setting. To address this limitation, we propose a
simple modification: the use of an inverted projection. We show that this
drop-in replacement for a standard projector is effective by learning to
disregard any task-specific features which might degrade the student's
performance. We find that this simple modification is sufficient for extending
many KD methods to the cross-task setting, where the teacher and student tasks
can be very different. In doing so, we obtain up to a 1.9
cross-task setting compared to the traditional projection, at no additional
cost. Our method can obtain significant performance improvements (up to 7
when using even a randomly-initialised teacher on various tasks such as depth
estimation, image translation, and semantic segmentation, despite the lack of
any learned knowledge to transfer. To provide conceptual and analytical
insights into this result, we show that using an inverted projection allows the
distillation loss to be decomposed into a knowledge transfer and a spectral
regularisation component. Through this analysis we are additionally able to
propose a novel regularisation loss that allows teacher-free distillation,
enabling performance improvements of up to 8.57
training costs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要