Negative Transfer in Cross Project Defect Prediction: Effect of Domain Divergence

2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)(2022)

引用 0|浏览5
暂无评分
摘要
Cross-project defect prediction (CPDP) models are used in new software project prediction tasks to improve defect prediction rates. The development of these CPDP models could be challenging in cases where there is little or no historical data. For this reason, researchers may need to rely on multiple sources and use transfer learning-based CPDP for building defect prediction models. These data are typically taken from similar and related projects, but their distributions can be different from the new software project (target data). Although, transfer learning-based CPDP models are designed to handle these distribution differences, but if not correctly handled by the model, may lead to negative transfer. To this end, recent works have focused on building transfer CPDP models, but little is known about how similar or dissimilar sources should be to avoid negative transfer. This paper provides the first empirical investigation to understand the effect of combining different sources with different levels of similarities in transfer CPDP. We introduce the use of the Population Stability Index (PSI) to interpret whether the distribution of the combined or single-source data is similar to the target data. This was validated using an adversarial approach. Experimental results on three public datasets reveal that when the source and target distribution are very similar, the probability of false alarm is improved by 3% to 7% and the recall indicator is reduced from 1% to 8%. Interestingly, we also found that when dissimilar source data are combined with different source datasets, the overall domain divergence is lowered, and the performance is improved. The results highlight the importance of using the right source to aid the learning process.
更多
查看译文
关键词
cross-project defect prediction,negative transfer,transfer learning,data shift
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要