Supervised and Unsupervised Learning Techniques Utilizing Malware Datasets.

Daryle Smith,Sajad Khorsandroo,Kaushik Roy

2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC)（2023）

引用 2|浏览14

暂无评分

摘要

Malware continues to gain momentum as it becomes more sophisticated against detection. Monitoring tools and antivirus software do not have the ability to keep up with the ever-going changes of these malignant variants. Due to these dilemmas, machine learning has gained popularity in classification and detection of malware related data. In this study, two separate datasets, Malware-Exploratory and CIC-MalMem-2022, undergo a series of supervised and unsupervised learning procedures to first gather information for observation. The developed model in this research utilizes three clustering algorithms for analysis, K-Means, DBSCAN, and GMM. The model also uses seven classification algorithms for predicting malware including Decision Tree, Random Forest, Ada Boost, KNeighbors, Stochastic Gradient Descent, Extra Trees, and Gaussian Naïve Bayes. Results have shown that Malware-Exploratory dataset averaged an accuracy score of 90% while CIC-MalMem-2022 dataset averaged a score of 99%. Both datasets also showed consistency across all three clustering algorithms. Besides, correlation between variables do not necessarily need to be highly related for malware detection. Future studies will determine if the results remain stable against feature selection and genetic algorithms.

查看译文

关键词

area under the curve-receiver operating characteristics (AUC-ROC),density-based spatial clustering of applications with noise (DBSCAN),Gaussian Mixture Model (GMM),hierarchical density-based spatial clustering of applications with noise (HDBSCAN),supervised machine learning,unsupervised machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要