Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It
CoRR(2024)
摘要
Label smoothing (LS) is a popular regularisation method for training deep
neural network classifiers due to its effectiveness in improving test accuracy
and its simplicity in implementation. "Hard" one-hot labels are "smoothed" by
uniformly distributing probability mass to other classes, reducing overfitting.
In this work, we reveal that LS negatively affects selective classification
(SC) - where the aim is to reject misclassifications using a model's predictive
uncertainty. We first demonstrate empirically across a range of tasks and
architectures that LS leads to a consistent degradation in SC. We then explain
this by analysing logit-level gradients, showing that LS exacerbates
overconfidence and underconfidence by regularising the max logit more when the
probability of error is low, and less when the probability of error is high.
This elucidates previously reported experimental results where strong
classifiers underperform in SC. We then demonstrate the empirical effectiveness
of logit normalisation for recovering lost SC performance caused by LS.
Furthermore, based on our gradient analysis, we explain why such normalisation
is effective. We will release our code shortly.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要