A hybrid sampling method for highly imbalanced and overlapped data classification with complex distribution

INFORMATION SCIENCES(2024)

引用 0|浏览4
暂无评分
摘要
Class imbalance and overlap coupling is the primary cause of performance degradation for classifiers. Unfortunately, it always occurs. In this paper, we propose a hybrid sampling method derived from optimized generative adversarial network and natural neighbor search (HD-GNNS) for this scenario. The new approach considers both the global distribution and local distribution of the minority class to improve the data distribution fundamentally. First, natural neighbor search with Fisher's discriminant ratio is conducted to screen overlapped sample subset and remove noise samples. It effectively overcomes the parameter sensitivity by adaptively determining the search radius. Then, an encoder with squeeze and excite block is introduced into generative adversarial network, and the structure of generative adversarial network is optimized with cross-layer and low-rank matrix. It better captures the distribution characteristics of minority samples in overlapped subset for oversampling. Afterwards, the local density of majority samples in overlapped subset is calculated by the aforementioned natural neighbor search method, and Thornton's Separation Index is used to implement under-sampling adaptively. We evaluate the proposed approach on 1 artificial dataset, 14 UCI datasets and 8 real-word datasets. The experimental results show that the proposed HD-GNNS exhibits more impressive performance compared to other benchmark methods.
更多
查看译文
关键词
Imbalanced and overlapped data classification,Hybrid sampling,Oversampling derived from generative,adversarial network,search,Under-sampling based on natural neighbor
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要