FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs
arxiv(2024)
摘要
Analyzing the behavior of cryptographic functions in stripped binaries is a
challenging but essential task. Cryptographic algorithms exhibit greater
logical complexity compared to typical code, yet their analysis is unavoidable
in areas such as virus analysis and legacy code inspection. Existing methods
often rely on data or structural pattern matching, leading to suboptimal
generalizability and suffering from manual work. In this paper, we propose a
novel framework called FoC to Figure out the Cryptographic functions in
stripped binaries. In FoC, we first build a binary large language model
(FoCBinLLM) to summarize the semantics of cryptographic functions in natural
language. The prediction of FoC-BinLLM is insensitive to minor changes, such as
vulnerability patches. To mitigate it, we further build a binary code
similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive
representations and use it to retrieve similar implementations of unknown
cryptographic functions in a database. In addition, we construct a
cryptographic binary dataset for evaluation and to facilitate further research
in this domain. And an automated method is devised to create semantic labels
for extensive binary functions. Evaluation results demonstrate that FoC-BinLLM
outperforms ChatGPT by 14.61
previous best methods with a 52
shows practical ability in virus analysis and 1-day vulnerability detection.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要