Weidong Zhou,Tianbo Wang,Guotao Huang,Xiaopeng Liang,Chunhe Xia,Xiaojian Li
标识
DOI:10.1109/trustcom60117.2023.00211
摘要
The data-driven deep learning methods have brought significant progress and potential to intrusion detection. However, there are two thorny problems caused by the characteristics of intrusion data: "multi-type features" and "data imbalance". The former means that forcefully and improperly transforming intrusion features from distinct metric spaces can result in semantic loss and noise. The latter indicates that the intrusion data is imbalanced in quantity and quality due to its complex spatial distribution. We propose a Hybrid Framework for Multi-type and Imbalance Data (HF-Mid) to address the above two problems. Firstly, we divide the intrusion features into equivalent and non-equivalent groups, and then embed them sequentially using Supervised Paragraph Vector-Distributed Memory (SPV-DM), which excels at modeling co-occurrence relationships, and Deep Neural Network (DNN), which is suitable for modeling non-linear relationships, thereby solving the "multitype features" problem. Secondly, we adopt a low-noise collective matrix factorization (CMF) model to fuse the two obtained features for dimensionality reduction. Finally, we employ a multiple classifier to detect intrusion. During the classifier training stage, we design a genetic algorithm-based proportional sampling method to select high-quality samples in each training batch. thus addressing the "data imbalance" problem. The experimental results demonstrate the proposed framework exhibits an overall improvement of 5.9% and 1.5% in terms of accuracy and false positive rate on average, respectively.