SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge
CoRR(2024)
摘要
Adversarial examples are malicious inputs to machine learning models that
trigger a misclassification. This type of attack has been studied for close to
a decade, and we find that there is a lack of study and formalization of
adversary knowledge when mounting attacks. This has yielded a complex space of
attack research with hard-to-compare threat models and attacks. We focus on the
image classification domain and provide a theoretical framework to study
adversary knowledge inspired by work in order theory. We present an adversarial
example game, inspired by cryptographic games, to standardize attacks. We
survey recent attacks in the image classification domain and classify their
adversary's knowledge in our framework. From this systematization, we compile
results that both confirm existing beliefs about adversary knowledge, such as
the potency of information about the attacked model as well as allow us to
derive new conclusions on the difficulty associated with the white-box and
transferable threat models, for example, that transferable attacks might not be
as difficult as previously thought.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要