A Combination Method of Resampling and Random Forest for Imbalanced Data Classification

Liu Zheng,Qiu Han,Zhu Junhu

2022 4th International Conference on Advances in Computer Technology, Information Science and Communications (CTISC)（2022）

引用 1|浏览0

暂无评分

摘要

In the research of imbalanced data classification, the resampling effect of the existing resampling and random forest combination technology is greatly affected by the characteristic dimension of the training set, resulting in the unsatisfactory classification effect of the model. To solve this problem, a combination method of resampling and Random Forest for imbalanced data classification is proposed. The core idea is to resample the original data set multiple times by using the feature subset of each subtree in the random forest. By reducing the number of features used in a single resampling, the influence of feature dimension on resampling effect is reduced; Different feature subsets are used for resampling in different subtrees to improve the diversity of base classifiers. Based on SMOTE, B-SMOTE and ADASYN resampling techniques, three different optimization algorithms PSRF, PBSRF and PADRF are realized respectively. Taking the geometric mean and recall rate as evaluation indexes, the experimental results on 10 groups of KEEL data show that compared with the algorithm before optimization, the geometric mean values of the three algorithms are increased by 0.85%, 2.93% and 0.79% respectively, and the recall rates are increased by 1.44%, 3.26% and 1.07% respectively.

查看译文

关键词

machine learning,imbalanced data,resampling,Random Forest

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要