Empirical Comparison Of The Feature Evaluation Methods Based On Statistical Measures

IEEE ACCESS(2021)

引用 8|浏览1
暂无评分
摘要
One of the most important classification problems is selecting proper features, i.e. features that describe the classified object in the most straightforward way possible. Then, one of the biggest challenges of the feature selection is the evaluation of the feature's quality. There is a plethora of feature evaluation methods in the literature. This paper presents the results of a comparison between nine selected feature evaluation methods, both existing in literature and newly defined. To make a comparison, features from ten various sets were evaluated by every method. Then, from every feature set, best subset (according to each method) was chosen. Those subsets then were used to train a set of classifiers (including decision trees and forests, linear discriminant analysis, naive Bayes, support vector machines, k nearest neighbors and an artificial neural network). The maximum accuracy of those classifiers, as well as the standard deviation between their accuracies, were used as a quality measures of each particular method. Furthermore, it was determined, which method is the most universal in terms of the data set, i.e. for which method, obtained accuracies were dependent on the feature set the least. Finally, computation time of each method was compared. Results indicated that for applications with limited computational power, method based on the average overlap between feature's values seem best suited. It led to high accuracies and proved to be fast to compute. However, if the data set is known to be normally distributed, method based on two-sample t-test may be preferable.
更多
查看译文
关键词
Classification, dimensionality reduction, distribution overlap, feature evaluation, feature extraction, feature selection, filter methods, machine learning, overlap coefficient, pattern recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要