Online Heterogeneous Streaming Feature Selection without Feature Type Information

IEEE Transactions on Big Data(2024)

引用 0|浏览5
暂无评分
摘要
Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. However, features may be generated dynamically and arrive individually over time in practice, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature in advance, but this is unreasonable and unrealistic. Therefore, this paper first studies a practical issue of Online Heterogeneous Streaming Feature Selection without the feature type information before learning, named OHSFS. Specifically, we first model the streaming feature selection issue as a minimax problem. Then, in terms of MIC (Maximal Information Coefficient), we derive a new metric $MIC_{Gain}$ to determine whether a new streaming feature should be selected. To speed up the efficiency of OHSFS, we present the metric $MIC_{Cor}$ that can directly discard low correlation features. Finally, extensive experimental results indicate the effectiveness of OHSFS. Moreover, OHSFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.
更多
查看译文
关键词
Online Feature Selection,Streaming Feature,Heterogeneous Feature,Maximal Information Coefficient
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要