计算机科学
程序切片
切片
变量(数学)
脆弱性(计算)
编码(集合论)
程序设计语言
计算机安全
计算机图形学(图像)
集合(抽象数据类型)
数学分析
数学
作者
Tongshuai Wu,Liwei Chen,Gewangzi Du,Dan Meng,Gang Shi
标识
DOI:10.1109/tifs.2024.3374219
摘要
Detecting vulnerabilities in source code using deep learning models is emerging as a valuable research area. The key issue in using deep learning to detect vulnerabilities is the accurate representation. Current approaches for detecting vulnerabilities in C/C++ programs use functions or lines of code as the unit and only consider the basic syntactic structure of vulnerabilities. Unfortunately, functions and lines of code still have vulnerability-unrelated information, which is redundant for vulnerability features and is not conducive to deep learning models to learn accurate vulnerability patterns. This paper deeply analyzes the essential features of vulnerabilities and attacks. Then, we propose a novel variable-based deep learning vulnerability detection method for C/C++ that is more granular than existing function- or line of code-based vulnerability detection methods. Based on the triggering mechanism of vulnerabilities and typical memory attacks, we propose the concepts of key variables and insecure operations; these are used to propose new rules for determining the center point of code slices with more accurate vulnerability features. We propose the first ultra-fine-grained variable-based code slicing (UltraVCS) method by the new center point, which focuses on the vulnerability-related variable. This method removes as much vulnerability-unrelated information as possible to achieve more accurate vulnerability feature extraction. Experiments show that our approach can generate more code slices, achieve more precise vulnerability representation, and perform better vulnerability detection in open-source projects compared to state-of-the-art methods. Furthermore, we have discovered four zero-day vulnerabilities in real-world application scenarios in open-source projects.
科研通智能强力驱动
Strongly Powered by AbleSci AI