Benchmarking the Benchmarks.

Marc Miltenberger,Steven Arzt,Philipp Holzinger, Julius Näumann

AsiaCCS（2023）

引用 0|浏览18

暂无评分

摘要

Over the years, security researchers have developed a broad spectrum of automatic code scanners that aim to find security vulnerabilities in applications. Security benchmarks are commonly used to evaluate novel scanners or program analysis techniques. Each benchmark consists of multiple positive test cases that reflect typical implementations of vulnerabilities, as well as negative test cases, that reflect secure implementations without security flaws. Based on this ground truth, researchers can demonstrate the recall and precision of their novel contributions. However, as we found, existing security benchmarks are often underspecified with respect to their underlying assumptions and threat models. This may lead to misleading evaluation results when testing code scanners, since it requires the scanner to follow unclear and sometimes even contradictory assumptions. To help improve the quality of benchmarks, we propose SecExploitLang, a specification language that allows the authors of benchmarks to specify security assumptions along with their test cases. We further present Exploiter, a tool than can automatically generate exploit code based on a test case and its SecExploitLang specification to demonstrate the correctness of the test case. We created SecExploitLang specifications for two common security benchmarks and used Exploiter to evaluate the adequacy of their test case implementations. Our results show clear shortcomings in both benchmarks, i.e., a significant number of positive test cases turn out to be unexploitable, and even some negative test case implementation turn out to be exploitable. As we explain, the reasons for this include implementation defects, as well as design flaws, which impacts the meaningfulness of evaluations that were based on them. Our work shall highlight the importance of thorough benchmark design and evaluation, and the concepts and tools we propose shall facilitate this task.

查看译文

关键词

security benchmarks, exploits

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要