FitNets: Hints for Thin Deep Nets
international conference on learning representations, 2015.
While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a...More
PPT (Upload PPT)