Software Vulnerability Prediction in Low-Resource Languages: an Empirical Study of CodeBERT and ChatGPT

Triet Huynh Minh Le,M. Ali Babar, Tung Hoang Thai

PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024（2024）

引用 0|浏览10

暂无评分

摘要

Background: Software Vulnerability (SV) prediction in emerging languages isincreasingly important to ensure software security in modern systems. However,these languages usually have limited SV data for developing high-performingprediction models. Aims: We conduct an empirical study to evaluate the impactof SV data scarcity in emerging languages on the state-of-the-art SV predictionmodel and investigate potential solutions to enhance the performance. Method:We train and test the state-of-the-art model based on CodeBERT with and withoutdata sampling techniques for function-level and line-level SV prediction inthree low-resource languages - Kotlin, Swift, and Rust. We also assess theeffectiveness of ChatGPT for low-resource SV prediction given its recentsuccess in other domains. Results: Compared to the original work in C/C++ withlarge data, CodeBERT's performance of function-level and line-level SVprediction significantly declines in low-resource languages, signifying thenegative impact of data scarcity. Regarding remediation, data samplingtechniques fail to improve CodeBERT; whereas, ChatGPT showcases promisingresults, substantially enhancing predictive performance by up to 34.4function level and up to 53.5highlighted the challenge and made the first promising step for low-resource SVprediction, paving the way for future research in this direction.

查看译文

关键词

Software vulnerability,Software security,Large language models,ChatGPT,Empirical study

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要