Leveraging Feature Selection to Improve the Accuracy for Malware Detection

Research Square (Research Square)(2023)

引用 0|浏览0
暂无评分
摘要
Abstract Malware is becoming increasingly sophisticated and difficult to detect with traditional monitoring tools and antivirus software. As a result, machine learning has become a popular approach for classifying and detecting malware-related data. In this study, two distinct datasets, Malware-Exploratory and CIC-MalMem-2022, were subjected to a series of supervised and unsupervised learning procedures to gather information for observation. As this is an extension of a previous research, the developed model is enhanced to include feature selection using Pearson correlation coefficient and genetic algorithm. It is then tested against a created dataset SMITH and a GAN dataset produced from SMITH, along with the datasets Malware-Exploratory and CIC-MalMem-2022 from the previous work. The model still uses the three clustering algorithms for analysis, namely K-Means, Density-Based Spatial Clustering of Applications with Noise, and Gaussian Mixture Model, and seven classification algorithms for predicting malware, namely Decision Tree, Random Forest, Ada Boost, KNeighbors, Stochastic Gradient Descent, Extra Trees, and Gaussian Naïve Bayes. Previous results showed that the Malware-Exploratory raw dataset achieved an accuracy score of 90%, while the CIC-MalMem-2022 raw dataset achieved a score of 99%. The results from this research show that the genetic algorithm emerges as the best method for detecting malware in the Malware-Exploratory and CIC-MalMem-2022 datasets, while the Pearson correlation coefficient performs well against the SMITH dataset.
更多
查看译文
关键词
feature selection,detection,accuracy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要