Empirical analysis of threshold values for rank-based filter feature selection methods in software defect prediction

Malek Almomani,Abdullateef O. Balogun,Shuib Basri,Abdullahi A. Imam,Ammar K. Alazzawi,Victor E. Adeyemo,Ganesh Kumar

JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY（2023）

引用 0|浏览13

暂无评分

摘要

Many studies have been conducted to explore the influence of feature selection (FS) techniques on software defect prediction (SDP) models, with conflicting empirical results and research outcomes. These reported contradictions may be due to relative research limitations, such as types of FS techniques or the size of defect datasets. In the instance of FS methods, it was discovered that selecting a suitable threshold value for picking top-ranked features in FS methods might be a cause of discrepancies in reported findings on SDP. Investigating and assessing the impacts of threshold values for the rank-based filter (RBF) FS techniques, as done in this work, becomes critical. 4 RBF (Chi-square, Correlation, Information Gain, and Relief) methods with 5 thresholds (No FS, log2N, Top20%, Top 30%, and Top 50%) values were investigated with 2 prediction models (Naive Bayes (NB) and Decision Tree (DT)) on 25 software defects datasets. The experimented RBF techniques were selected based on distinct computational features to assure heterogeneity, as well as their performance in the current SDP research. Developed SDP models were evaluated using accuracy and area under the curve (AUC) values while the Scott-KnottESD rank statistical test technique was employed to rank experimented RBF methods with applied threshold values. According to the experimental results, selecting the Top20% of top-ranked features in RBF methods had a greater (positive) impact on the prediction performances of SDP models than other applied threshold values. Furthermore, the outcomes of this study corroborate previous research on the capacity of FS techniques to improve the prediction efficacies of SDP models. Consequently, we urge that FS methods be utilized in SDP tasks. In the case of RBF methods, the Top20% threshold value should be used since it outperforms de-factor log2N and other threshold values. Moreover, findings from this study can be a guide to subsequent SDP studies and further strengthen the tenacity of experimental findings and conclusions in SDP studies.

查看译文

关键词

Feature selection,Rank-based filter,Software defect prediction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要