Not All Coverage Measurements Are Equal: Fuzzing by Coverage Accounting for Input Prioritization

NDSS(2020)

引用 68|浏览79
暂无评分
摘要
Coverage-based fuzzing has been actively studied and widely adopted for finding vulnerabilities in real-world software applications. With coverage information, such as statement coverage and transition coverage, as the guidance of input mutation, coverage-based fuzzing can generate inputs that cover more code and thus find more vulnerabilities without prerequisite information such as input format. Current coverage-based fuzzing tools treat covered code equally. All inputs that contribute to new statements or transitions are kept for future mutation no matter what the statements or transitions are and how much they impact security. Although this design is reasonable from the perspective of software testing that aims at full code coverage, it is inefficient for vulnerability discovery since that 1) current techniques are still inadequate to reach full coverage within a reasonable amount of time, and that 2) we always want to discover vulnerabilities early so that it can be fixed promptly. Even worse, due to the non-discriminative code coverage treatment, current fuzzing tools suffer from recent anti-fuzzing techniques and become much less effective in finding vulnerabilities from programs enabled with anti-fuzzing schemes. To address the limitation caused by equal coverage, we propose coverage accounting, a novel approach that evaluates coverage by security impacts. Coverage accounting attributes edges by three metrics based on three different levels: function, loop and basic block. Based on the proposed metrics, we design a new scheme to prioritize fuzzing inputs and develop TortoiseFuzz, a greybox fuzzer for finding memory corruption vulnerabilities. We evaluated TortoiseFuzz on 30 real-world applications and compared it with 6 state-of-the-art greybox and hybrid fuzzers: AFL, AFLFast, FairFuzz, MOPT, QSYM, and Angora. Statistically, TortoiseFuzz found more vulnerabilities than 5 out of 6 fuzzers (AFL, AFLFast, FairFuzz, MOPT, and Angora), and it had a comparable result to QSYM yet only consumed around 2% of QSYM's memory usage on average. We also compared coverage accounting metrics with two other metrics, AFL-Sensitive and LEOPARD, and TortoiseFuzz performed significantly better than both metrics in finding vulnerabilities. Furthermore, we applied the coverage accounting metrics to QSYM and noticed that coverage accounting helps increase the number of discovered vulnerabilities by 28.6% on average. TortoiseFuzz found 20 zero-day vulnerabilities with 15 confirmed with CVE identifications.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要