GenTAL: Generative Denoising Skip-gram Transformer for Unsupervised Binary Code Similarity Detection.

Li Tao Li,Steven H. H. Ding,Philippe Charland

IJCNN（2023）

引用 0|浏览0

暂无评分

摘要

Binary code similarity detection serves a critical role in cybersecurity. It alleviates the huge manual effort required in the reverse engineering process for malware analysis and vulnerability detection, where the original source code is often not available. Most of the existing solutions focus on a manual feature engineering process and customized code matching algorithms that are inefficient and inaccurate. Recent deep learning-based solutions embed the semantics of binary code into a latent space through supervised contrastive learning. However, one cannot cover all the possible forms in the training set to learn the variance of the same semantics. In this paper, we propose an unsupervised model aiming to learn the intrinsic representation of assembly code semantics. Specifically, we propose a Transformer-based auto-encoder like language model for the low-level assembly code grammar to capture the abstract semantic representation. By coupling a Transformer encoder and a skip-gram style loss design, it can learn a compact representation that is robust against different compilation options. We conduct experiments on four different block-level code similarity tasks. It shows that our method is more robust compared to the state-of-the-art solutions.

查看译文

关键词

abstract semantic representation,assembly code semantics,code matching algorithms,different block-level code similarity tasks,existing solutions focus,generative denoising skip-gram Transformer,huge manual effort,low-level assembly code grammar,malware analysis,manual feature engineering process,original source code,recent deep learning-based solutions,reverse engineering process,skip-gram style loss design,supervised contrastive learning,Transformer encoder,Transformer-based auto-encoder,unsupervised binary code similarity detection,unsupervised model,vulnerability detection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要