谷歌浏览器插件
订阅小程序
在清言上使用

Empirical analysis of threshold values for rank-based filter feature selection methods in software defect prediction

JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY(2023)

引用 0|浏览13
暂无评分
摘要
Many studies have been conducted to explore the influence of feature selection (FS) techniques on software defect prediction (SDP) models, with conflicting empirical results and research outcomes. These reported contradictions may be due to relative research limitations, such as types of FS techniques or the size of defect datasets. In the instance of FS methods, it was discovered that selecting a suitable threshold value for picking top-ranked features in FS methods might be a cause of discrepancies in reported findings on SDP. Investigating and assessing the impacts of threshold values for the rank-based filter (RBF) FS techniques, as done in this work, becomes critical. 4 RBF (Chi-square, Correlation, Information Gain, and Relief) methods with 5 thresholds (No FS, log2N, Top20%, Top 30%, and Top 50%) values were investigated with 2 prediction models (Naive Bayes (NB) and Decision Tree (DT)) on 25 software defects datasets. The experimented RBF techniques were selected based on distinct computational features to assure heterogeneity, as well as their performance in the current SDP research. Developed SDP models were evaluated using accuracy and area under the curve (AUC) values while the Scott-KnottESD rank statistical test technique was employed to rank experimented RBF methods with applied threshold values. According to the experimental results, selecting the Top20% of top-ranked features in RBF methods had a greater (positive) impact on the prediction performances of SDP models than other applied threshold values. Furthermore, the outcomes of this study corroborate previous research on the capacity of FS techniques to improve the prediction efficacies of SDP models. Consequently, we urge that FS methods be utilized in SDP tasks. In the case of RBF methods, the Top20% threshold value should be used since it outperforms de-factor log2N and other threshold values. Moreover, findings from this study can be a guide to subsequent SDP studies and further strengthen the tenacity of experimental findings and conclusions in SDP studies.
更多
查看译文
关键词
Feature selection,Rank-based filter,Software defect prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要