Evaluating the efficacy of different adversarial training strategies

Roger Sengphanith,Diego Marez, Jane Berk,Shibin Parameswaran

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS V(2023)

引用 0|浏览1
暂无评分
摘要
Adversarial training (AT) is considered the most effective strategy to defend a machine learning model against adversarial attacks. There are many different methods to perform AT, but the underlying principle is the same, namely, augment the training data with adversarial examples. In this work, we investigate the efficacy of four different adversarial example generation strategies on AT of a given classification model. The four methods represent different categories of attack and data. Specifically, two of the adversarial generation algorithms perform attacks in the pixel domain, while others operate in the latent space of the data. On the other hand, two of the methods generate adversarial data samples designed to be near the model decision boundaries, while the other two generate generic adversarial examples (not necessarily at the boundary). The adversarial examples from these methods are used to adversarially train models on MNIST and CIFAR10. In the absence of a good metric to measure robustness of a model, capturing the effect of AT using a single metric can be a challenge. Hence, we opt to evaluate the robustness improvements resulting of the adversarially trained model using a variety of empirical metrics introduced in the literature that measure local Lipshitz value of a network (CLEVER), smoothness of decision boundaries, robustness to adversarial perturbations and defense transferability.
更多
查看译文
关键词
adversarial training,robustness metrics,pixel domain attacks,latent space attacks,boundary smoothness,boundary attacks,generalizable ensemble adversarial training,defense transferability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要