Software Fault Prediction for Imbalanced Data: A Survey on Recent Developments

Procedia Computer Science（2023）

引用 5|浏览4

暂无评分

摘要

The method of recognizing faults in a software system is acknowledged as software fault prediction. Software faults predicted in prior stages help in the management of resources and time required during software testing and maintenance. The identified software module can be fixed ahead of time, saving time and money near the end of the software development process. Over the years, various supervised machine learning-based techniques for fault prediction have been suggested. These models’ accuracy is based on the training datasets. The models are created and trained using a labeled dataset consisting of multiple independent variables like lines of codes, the complexity of the software, the size of the software, etc., and a dependent binary variable, either true or false. But the fault dataset may have some concerns like a class overlapping problem, class imbalance problem, null values, etc. Recent research in software fault prediction focuses on data quality. An imbalanced dataset is one in which one of the class data is present in the majority and another class data is present in the minority. Models built using imbalanced datasets are biased which results in inaccurate predictions. Therefore, balancing the dataset is important. In this paper, the most recent software fault prediction algorithms, which focus on class imbalance issues are discussed. A comparative presentation is presented in this paper, which would benefit the scholar in selecting the best techniques of fault prediction based on different datasets and algorithms. According to the survey, SMOTE is the most commonly used data sampling technique for dealing with data quality issues.

查看译文

关键词

imbalanced data,prediction,software

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要