可解释性
班级(哲学)
计算机科学
人工智能
特征(语言学)
机器学习
交叉验证
集合(抽象数据类型)
预测建模
数据挖掘
软件错误
软件
模式识别(心理学)
哲学
语言学
程序设计语言
作者
Hui Han,Qiao Yu,Yi Zhu,Shengyi Cheng,Yu Zhang
标识
DOI:10.1142/s0218194024500414
摘要
The class overlap problem refers to instances from different categories heavily overlapping in the feature space. This issue is one of the challenges in improving the performance of software defect prediction (SDP). Currently, the studies on the impact of class overlap on SDP mainly focused on within-project defect prediction and cross-project defect prediction. Moreover, the existing class overlap instances cleaning methods are not suitable for cross-version defect prediction. In this paper, we propose a class overlap instances cleaning method based on the Ratio of K-nearest neighbors with the Same Label (RKSL). This method removes instances with the abnormal neighbor ratio in the training set. Based on the RKSL method, we investigate the impact of class overlap on the performance and interpretability of the cross-version defect prediction model. The experiment results show that class overlap can affect the performance of cross-version defect prediction models significantly. The RKSL method can handle the class overlap problem in defect datasets, but it may impact the interpretability of models. Through the analysis of feature changes, we consider that class overlap instances cleaning can assist models in identifying more important features.
科研通智能强力驱动
Strongly Powered by AbleSci AI