Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals
arxiv(2024)
摘要
Deep neural networks (DNNs) are notoriously vulnerable to adversarial attacks
that place carefully crafted perturbations on normal examples to fool DNNs. To
better understand such attacks, a characterization of the features carried by
adversarial examples is needed. In this paper, we tackle this challenge by
inspecting the subspaces of sample features through spectral analysis. We first
empirically show that the features of either clean signals or adversarial
perturbations are redundant and span in low-dimensional linear subspaces
respectively with minimal overlap, and the classical low-dimensional subspace
projection can suppress perturbation features out of the subspace of clean
signals. This makes it possible for DNNs to learn a subspace where only
features of clean signals exist while those of perturbations are discarded,
which can facilitate the distinction of adversarial examples. To prevent the
residual perturbations that is inevitable in subspace learning, we propose an
independence criterion to disentangle clean signals from perturbations.
Experimental results show that the proposed strategy enables the model to
inherently suppress adversaries, which not only boosts model robustness but
also motivates new directions of effective adversarial defense.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要