Open world long-tailed data classification through active distribution optimization

EXPERT SYSTEMS WITH APPLICATIONS(2023)

引用 1|浏览42
暂无评分
摘要
Real-world data exhibits a long-tailed label distribution, which leads to classification bias. Popular re-sampling or re-weighting methods usually require known category information. However, learning from long-tailed data with open categories is a challenging issue. In this paper, we propose an active distribution optimization algorithm (DALC) to handle the interesting issue. Through clustering, querying and classification iterations, we explore new categories and balance label distribution. For clustering, we present an exploration technique that adaptively obtains optimal data distribution with minimal total distance/cost. For each query, we design a critical instance selection strategy with the cluster information. For classification, we establish an ensemble model to continuously balance the label distribution. We conducted experiments on synthetic, benchmark and domain datasets. The results of the significance test verified the effectiveness of DALC and its superiority over state-of-the-art long-tailed data classification and open set classification algorithms.
更多
查看译文
关键词
Active learning, Cost-sensitive, Long-tailed distribution, Open set classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要