Boosting(机器学习)
计算机科学
加权
数据挖掘
软件
机器学习
学习迁移
人工智能
算法
数据建模
软件错误
预测建模
数据库
医学
放射科
程序设计语言
作者
Nazgol Nikravesh,Mohammad Reza Keyvanpour
标识
DOI:10.1109/iccke57176.2022.9960103
摘要
A growing number of software projects makes it increasingly crucial to predict software defects. If adequate historical data are accessible, within-project defect prediction models can be reliable. However, during the early phases of software development, sufficient data are not available to train an effective predictor. Cross-project defect prediction (CPDP) utilizes information from previous mature projects (source data) for predicting whether new software modules (target data) will be defective. CPDP models must take into account the fact that data distributions between source and target projects are different. Cross-project defect prediction often reduces distribution differences by either selecting training data or using transfer learning methods. Using transfer learning effectively reduces distribution differences in recent CPDP models, yet none of them have taken into account the possibility that negative transfer may occur as a result of the imbalanced nature of defect data. In this paper, a four-step model is proposed, of which three steps are dedicated to the preparation of training data and their initial weights for use in the fourth step, which involves an enhanced version of the transfer boosting algorithm. In this algorithm, the imbalance nature of data is considered and the weighting of the source data is updated to enhance the prediction performance. Therefore, aside from reducing distribution discrepancy between source and target data, this model also addresses the issues related to defect data class imbalance. As compared to four state-of-the-art CPDP models, this model provided consistent and accurate predictions for fifteen projects from PROMISE, AEEEM, and SOFTLAB. Our proposed model provided the best average results for both AUC and F-measure and in some datasets, the improvements were more than 5%.
科研通智能强力驱动
Strongly Powered by AbleSci AI