3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers
CoRR(2024)
摘要
Improper parsing of attacker-controlled input is a leading source of software
security vulnerabilities, especially when programmers transcribe informal
format descriptions in RFCs into efficient parsing logic in low-level, memory
unsafe languages. Several researchers have proposed formal specification
languages for data formats from which efficient code can be extracted. However,
distilling informal requirements into formal specifications is challenging and,
despite their benefits, new, formal languages are hard for people to learn and
use.
In this work, we present 3DGen, a framework that makes use of AI agents to
transform mixed informal input, including natural language documents (i.e.,
RFCs) and example inputs into format specifications in a language called 3D. To
support humans in understanding and trusting the generated specifications,
3DGen uses symbolic methods to also synthesize test inputs that can be
validated against an external oracle. Symbolic test generation also helps in
distinguishing multiple plausible solutions. Through a process of repeated
refinement, 3DGen produces a 3D specification that conforms to a test suite,
and which yields safe, efficient, provably correct, parsing code in C.
We have evaluated 3DGen on 20 Internet standard formats, demonstrating the
potential for AI-agents to produce formally verified C code at a non-trivial
scale. A key enabler is the use of a domain-specific language to limit AI
outputs to a class for which automated, symbolic analysis is tractable.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要