计算机科学
软件错误
利用
安全性令牌
软件
源代码
人工智能
机器学习
编码(集合论)
特征(语言学)
数据挖掘
突出
节点(物理)
语义鸿沟
作者
Md Nasir Uddin,Bixin Li,Zafar Ali,Pavlos Kefalas,Inayat Khan,Islam Zada
标识
DOI:10.1007/s00500-022-06830-5
摘要
Recent years, software defect prediction systems are becoming quite popular since they improve software reliability by identifying the potential bugs in the code. Several models were introduced in literature that aim to support the developers. Unfortunately, these models consider the manually constructed code features and input into machine learning-based classifiers. Moreover, these baseline approaches ignore the semantic and contextual information of the source code. With this paper we present a software defect prediction model that address all these issues. The model employs bidirectional long-short term memory network (BiLSTM) and BERT-based semantic feature (SDP-BB) that captures the semantic features of code to predict defects in the corresponding software. In particular, it utilizes the BiLSTM to exploit contextual information from the embedded token vectors learned through BERT model. Moreover, it utilizes an attention mechanism to capture salient features of the nodes. This is done through a data augmentation technique for generating more training data. We evaluated our approach against state-of-the-art models using ten open-source projects in terms of F1-score in fault prediction. The experiments evaluated the performance of full-token and AST-node data processing methods conducting the length of coverage on each project from 50 to 90% in both within-project defect prediction (WPDP) and cross-project defect prediction (CPDP) experiments. The results indicate that the proposed method outperforms competing models.
科研通智能强力驱动
Strongly Powered by AbleSci AI