过采样
机器学习
计算机科学
断层(地质)
人工智能
软件
数据挖掘
班级(哲学)
软件错误
计算机网络
带宽(计算)
地震学
程序设计语言
地质学
作者
Santosh Singh Rathore,Satyendra Singh Chouhan,Dixit Kumar Jain,Aakash Gopal Vachhani
标识
DOI:10.1109/tr.2022.3158949
摘要
Imbalanced software fault datasets, having fewer faulty modules than the nonfaulty modules, make accurate fault prediction difficult. It is challenging for software practitioners to handle imbalanced fault data during software fault prediction (SFP). Earlier, several researchers have applied oversampling techniques such as synthetic minority oversampling techniques and others for imbalanced learning in SFP. However, most of these techniques resulted in overfitted prediction models. This article presents generative oversampling methods to handle imbalanced data problems in the SFP. Using the generative adversarial network (GAN) based approach, the presented methods generate synthetic samples of the faulty modules to balance the proportion of faulty and nonfaulty modules in the fault datasets. Further, SFP models are built on the processed fault datasets using different machine learning techniques. Experimental validation of the presented oversampling methods is done on 18 fault datasets gathered from PROMISE, JIRA, Eclipse data repositories, and precision, recall, f1-score, and AUC are used as evaluation measures. We extensively compared presented oversampling methods with various state-of-the-art class imbalance techniques and baseline models. The experimental results evidenced that the presented methods improved fault prediction performance and yielded better performance than the state-of-the-art class imbalance techniques.
科研通智能强力驱动
Strongly Powered by AbleSci AI