计算机科学
人工智能
机器学习
随机森林
僵尸网络
数据挖掘
一般化
领域(数学分析)
模式识别(心理学)
数学
互联网
数学分析
万维网
作者
Han Wang,Zhangguo Tang,Huanzhou Li,Jian Zhang,Shuangcheng Li,Junfeng Wang
标识
DOI:10.1016/j.comnet.2023.109992
摘要
Malware is often embedded with domain generation algorithms (DGAs) to prevent firewall interception and domain black-and-white list comparison detection while hiding command and control (C&C) servers to tighten the control of botnets. DGA domains are diverse and difficult to obtain, resulting in highly unbalanced datasets. Domain names generated by different DGA families do not differ much at the sequence data level and it is difficult to extract their features. The above characteristics lead to poor accuracy, poor generalization ability, and bloatedness of DGA domain name classification models based on deep learning. To solve the above problems, the visual representation of sequence data and the DGA domain classification model are presented in this paper. First, the DGA domain name is mapped to the attention recurrence plot (Att_RP) proposed in this paper, which can enrich the data phase space features and differentiate the key phase space features. After that, Att_RP is sent to a DGA domain name classification model (CI_GRU) proposed in this paper for data dimension transformation processing, followed by classification. Experiments show that the classification accuracy, F1_score, and recall of the model for a variety of DGA families in the wild are higher than 99%, and can also accurately classify four types of crafted DGA families. Compared with similar models, the model has high classification accuracy, low time consumption, low generalization error, and high efficiency, and the size of the model is less than one-tenth of similar models.
科研通智能强力驱动
Strongly Powered by AbleSci AI