计算机科学
反事实思维
标识符
脆弱性(计算)
过程(计算)
源代码
数据挖掘
软件
软件质量
编码(集合论)
集合(抽象数据类型)
计算机安全
软件开发
程序设计语言
认识论
哲学
作者
Hongyu Kuang,Feng Yang,Long Zhang,Gaigai Tang,Lin Yang
标识
DOI:10.1109/scam59687.2023.00024
摘要
Software vulnerability detection is a critical aspect of ensuring the security and reliability of software systems. However, traditional vulnerability detection approaches often have limitations due to the scarcity and need for more diversity in labeled data. This research introduces a novel approach to overcome these challenges by utilizing user-defined identifiers in the source code to generate counterfactual training data. User-defined identffiers, such as variable and function names, contain essential information about the intentions and logic of the program. By perturbing these identifiers while maintaining the syntactic and semantic structure of the code, we create a diverse set of counterfactual examples that simulate potential vulnerabilities. When combined with existing labeled data, these counterfactual examples enrich the training process for vulnerability detection models. To evaluate the effectiveness of our approach, we conduct experiments on various datasets, achieving state-of-the-art performance on the VulDeePecker and Draper datasets. Our approach also outperforms models that utilize the same pre-trained language model in terms of accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI