Learning and Fusing Multi-View Code Representations for Function Vulnerability Detection

ELECTRONICS(2023)

引用 1|浏览9
暂无评分
摘要
The explosive growth of vulnerabilities poses a significant threat to the security of software systems. While various deep-learning-based vulnerability detection methods have emerged, they primarily rely on semantic features extracted from a single code representation structure, which limits their ability to detect vulnerabilities hidden deep within the code. To address this limitation, we propose (SFVD)-F-2, short for Sequence and Structure Fusion-based Vulnerability Detector, which fuses vulnerability-indicative features learned from the multiple views of the code for more accurate vulnerability detection. Specifically, (SFVD)-F-2 employs either well-matched or carefully extended neural network models to extract vulnerability-indicative semantic features from the token sequence, attributed control flow graph (ACFG) and abstract syntax tree (AST) representations of a function, respectively. These features capture different perspectives of the code, which are then fused to enable (SFVD)-F-2 to accurately detect vulnerabilities that are well-hidden within a function. The experiments conducted on two large vulnerability datasets demonstrated the superior performance of (SFVD)-F-2 against state-of-the-art approaches, with its accuracy and F1 scores reaching 98.07% and 98.14% respectively in detecting the presence of vulnerabilities, and 97.93% and 97.94%, respectively, in pinpointing specific vulnerability types. Furthermore, with regard to the real-world dataset D2A, (SFVD)-F-2 achieved average performance gains of 6.86% and 14.84% in terms of accuracy and F1 metrics, respectively, over the state-of-the-art baselines. This ablation study also confirms the superiority of fusing the semantics implied in multiple distinct code views to further enhance vulnerability detection performance.
更多
查看译文
关键词
vulnerability,code,function,multi-view
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要