Unpacking the Gap Box Against Data-Free Knowledge Distillation.

IEEE transactions on pattern analysis and machine intelligence(2024)

引用 0|浏览4
暂无评分
摘要
Data-free knowledge distillation (DFKD) improves the student model (S) by mimicking the class probability from a pre-trained teacher model (T) without training data. Under such setting, an ideal scenario is that T can help generate "good" samples from a generator (G) to maximally benefit S. However, existing arts suffer from the non-ideal generated samples under the disturbance of the gap (i.e., either too large or small) between the class probabilities of T and S; for example, the generated samples with too large gap may exhibit excessive information for S, while too small gap leads to the limited knowledge in the samples, resulting into the poor generalization. Meanwhile, they fail to judge the "goodness" of the generated samples for S since the fixed T is not necessarily ideal. In this paper, we aim to answer what is inside the gap box; together with how to yield "good" generated samples for DFKD? To this end, we propose a Gap-Sensitive Sample Generation (GapSSG) approach, by revisiting the empirical distilled risk from a data-free perspective, which confirms the existence of an ideal teacher (T *), while theoretically implying: (1) the gap disturbance originates from the mismatch between T and T *, hence the class probabilities of T enable the approximation to those of T *; and (2) "good" samples should maximally benefit S via T's class probabilities, owing to unknown T *. To this end, we unpack the gap box between T and S as two findings: inherent gap to perceive T and T *; derived gap to monitor S and T *. Benefiting from the derived gap that focuses on the adaptability of generated sample to S, we attempt to track student's training route (a series of training epochs) to capture the category distribution of S; upon which, a regulatory factor is further devised to approximate T * over inherent gap, so as to generate "good" samples to S. Furthermore, during the distillation process, a sample-balanced strategy comes up to tackle the overfitting and missing knowledge issues between the generated partial and critical samples by training G. The theoretical and empirical studies verify the advantages of GapSSG over the state-of-the-arts. Our code is available at https://github.com/hfutqian/GapSSG.
更多
查看译文
关键词
Data-free knowledge distillation,derived gap,empirical distilled risk,generative model,inherent gap
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要