计算机科学
深度学习
人工智能
循环神经网络
编码(集合论)
机器学习
二元分类
相似性(几何)
二进制数
源代码
特征学习
人工神经网络
模式识别(心理学)
支持向量机
数学
算术
操作系统
图像(数学)
集合(抽象数据类型)
程序设计语言
作者
Donghai Tian,Xiaoqi Jia,Rui Ma,Shuke Liu,Wenjing Liu,Changzhen Hu
标识
DOI:10.1016/j.eswa.2020.114348
摘要
Binary code similarity detection (BCSD) plays an important role in malware analysis and vulnerability discovery. Existing methods mainly rely on the expert’s knowledge for the BCSD, which may not be reliable in some cases. More importantly, the detection accuracy (or performance) of these methods are not so satisfied. To address these issues, we propose BinDeep, a deep learning approach for binary code similarity detection. This method firstly extracts the instruction sequence from the binary function and then uses the instruction embedding model to vectorize the instruction features. Next, BinDeep applies a Recurrent Neural Network (RNN) deep learning model to identify the specific types of two functions for later comparison. According to the type information, BinDeep selects the corresponding deep learning model for similarity comparison. Specifically, BinDeep uses the Siamese neural networks, which combine the LSTM and CNN to measure the similarities of two target functions. Different from the traditional deep learning model, our hybrid model takes advantage of the CNN spatial structure learning and the LSTM sequence learning. The evaluation shows that our approach can achieve good BCSD between cross-architecture, cross-compiler, cross-optimization, and cross-version binary code.
科研通智能强力驱动
Strongly Powered by AbleSci AI