An Empirical Study of Class Rebalancing Methods for Actionable Warning Identification

IEEE TRANSACTIONS ON RELIABILITY(2023)

引用 1|浏览0
暂无评分
摘要
Actionable warning identification (AWI) is crucial for improving the usability of static analysis tools. Currently, machine learning (ML)-based AWI approaches are notably common, which mainly focus on seeking high performance by improving the warning feature extraction and advancing the AWI model training. However, these approaches ignore an important fact that the number of actionable warnings is much smaller than that of unactionable warnings in the warning dataset used for the AWI model training (i.e., the class imbalance). Learning from such an imbalanced dataset may limit the performance of ML-based AWI approaches. To bridge the above gap, we are the first to conduct a comprehensive empirical study to investigate the impact of class imbalance on the ML-based AWI performance, whether class rebalancing methods can improve the ML-based AWI performance, and the differences of class rebalancing methods in the ML-based AWI model. Our empirical study is performed on 9 real-world and large-scale warning datasets, 25 typical class rebalancing methods, and 7 commonly used ML models. The experimental results show that 1) the class imbalance has a negative impact on the ML-based AWI performance; 2) 85% class rebalancing methods can significantly improve the ML-based AWI performance, but 8% ones do not work in the imbalanced warning datasets; 3) RandomOverSampler combined with AdaBoost/Random Forest can make the ML-based AWI model achieve optimal performance on nine warning datasets. Finally, we provide three practical guidelines that could help refine ML-based AWI approaches.
更多
查看译文
关键词
Actionable warning identification (AWI),class imbalance,class rebalancing methods,machine learning (ML),static analysis warnings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要