Code Vulnerability Detection via Signal-Aware Learning.

EuroS&P(2023)

引用 0|浏览14
暂无评分
摘要
Machine Learning-based modeling of source code understanding tasks has been gaining popularity. Accompanying their rapid proliferation is an emerging scrutiny over the models' reliability. Concerns have been raised regarding the models not actually learning task-relevant source code features, but fitting other correlated data. To improve model trustworthiness, in this work, we explore data-driven approaches for enhancing model signal awareness, i.e., learning the relevant signals in the input for making predictions. We do so by incorporating the notion of code complexity during model training, both (i) explicitly via curriculum learning, and (ii) implicitly by augmenting the training dataset with simplified signal-preserving programs. With our techniques, we achieve up to 4.8x improvement in signal awareness of vulnerability detection models. Using the notion of code complexity, we present a novel interpretation of the model learning behaviour from the perspective of the dataset. We use it to introspect model learning difficulties, and analyze the learning enhancements achieved with our approaches.
更多
查看译文
关键词
code complexity,code vulnerability detection,curriculum learning,data-driven approaches,learning enhancements,machine learning,model learning behaviour,model signal awareness,model training,model trustworthiness,relevant signals,signal-aware learning,simplified signal-preserving programs,source code understanding tasks,task-relevant source code features,vulnerability detection models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要