Research on classification method of high-dimensional class-imbalanced datasets based on SVM

International Journal of Machine Learning and Cybernetics(2018)

引用 51|浏览52
暂无评分
摘要
High-dimensional problems result in bad classification results because some combinations of features have an adverse effect on classification; while class-imbalanced problems make the classifier to concern the majority class more but the minority less, because the number of samples of majority class is more than minority class. The problem of both high-dimensional and class-imbalanced classification is found in many fields such as bioinformatics, healthcare and so on. Many researchers study either the high-dimensional problem or class-imbalanced problem and come up with a series of algorithms, but they ignore the above new problem, which indicates high-dimensional problems affect sampling process while class-imbalanced problems interfere feature selection. Firstly, this paper analyses the new problem arising from the mutual influence of the two problems, and then introduces SVM and analyses its advantages in dealing high-dimensional problem and class-imbalanced problem. Next, this paper proposes a new algorithm named BRFE-PBKS-SVM aimed at high-dimensional class-imbalanced datasets, which improves SVM-RFE by considering the class-imbalanced problem in the process of feature selection, and it also improves SMOTE so that the procedure of over-sampling could work in the Hilbert space with an adaptive over-sampling rate by PSO. Finally, the experimental results show the performance of this algorithm.
更多
查看译文
关键词
High-dimensional, Class-imbalanced, Feature selection, Boundary samples, Over-sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要