Malicious Domain Names Detection Algorithm Based On Lexical Analysis And Feature Quantification

IEEE ACCESS(2019)

引用 14|浏览12
暂无评分
摘要
Malicious domain names usually refer to a series of illegal activities, posing threats to people's privacy and property. Therefore, the problem of detecting malicious domain names has aroused widespread concerns. In this study, a malicious domain names detection algorithm based on lexical analysis and feature quantification is proposed. To achieve efficient and accurate detection, the method includes two phases. The first phase checks an observed domain name against a blacklist of known malicious uniform resource locator (URLs). The observed domain name is classified as being definitely malicious or potentially malicious based on its edit distances to the domain names on the blacklist. The second phase further evaluates a potential malicious domain name by its reputation value that represents its lexical feature and is calculated based on an N-gram model. The top 100,000 normal domain names in Alexa are used to obtain a whitelist substring set using the N-gram method in which each domain name excluding the top-level domain is segmented into substrings with the length of 3, 4, 5, 6 and 7. The weighted values of the substrings are calculated according to their occurrence counts in the whitelist substring set. A potential malicious domain name is segmented by the N-gram method and its reputation value is calculated based on the weighted values of its substrings. Finally, the potential malicious domain name is determined to be malicious or normal based on its reputation value. The effectiveness of the proposed detection method has been demonstrated by experiments on public available data.
更多
查看译文
关键词
Malicious domain names, N-gram, domain name substring, edit distance, reputation value
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要