Milo: Attacking Deep Pre-trained Model for Programming Languages Tasks with Anti-analysis Code Obfuscation.

COMPSAC(2023)

引用 0|浏览1
暂无评分
摘要
Deep neural networks, especially pre-trained BERT models, have been widely applied in programming language processing tasks and achieved promising results. Their downstream applications such as code clone detection and code search play a crucial role in data-driven security solutions such as vulnerability analysis. However, the resilience of these models against anti-analysis attacks remains unexplored. Therefore, we try to investigate whether deep neural networks can remain the same performance on different types of code change and what types of biases are introduced in the learning process. We introduce a new code obfuscation tool, a Multiprogramming-language Obfuscator (Milo), for programming language processing tasks. Milo can be used to generate adversarial data to verify the model's generalizability and robustness against code obfuscations. Milo supports five obfuscation methods: variable renaming, method renaming, string splitting, operation substitution, and control flow shuffling on three mainstream programming languages including Java, Python, and JavaScript. It is designed to apply anti-analysis obfuscation techniques across different programming languages that alter the syntactic and semantic features of a code snippet. To better quantify the adverse effects of anti-analysis techniques on pre-trained models for programming languages, we have performed extensive experiments across several pre-trained models, BERT, CodeBERT, and GraphCodeBERT with four downstream tasks which are code documentation generation, code clone detection, code search, and code translation. Our results indicate that most pre-trained BERT models are susceptible to code obfuscations and rely heavily on the literal representations (name or string) of the code segment.
更多
查看译文
关键词
programming languages tasks,code,pre-trained,anti-analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要