计算机科学
分类器(UML)
k-最近邻算法
训练集
模式识别(心理学)
人工智能
数据挖掘
机器学习
作者
Yidi Wang,Zhibin Pan,Yiwei Pan
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2019-06-28
卷期号:31 (5): 1544-1556
被引量:48
标识
DOI:10.1109/tnnls.2019.2920864
摘要
The k -nearest neighbor (KNN) rule is a successful technique in pattern classification due to its simplicity and effectiveness. As a supervised classifier, KNN classification performance usually suffers from low-quality samples in the training data set. Thus, training data set cleaning (TDC) methods are needed for enhancing the classification accuracy by cleaning out noisy, or even wrong, samples in the original training data set. In this paper, we propose a classification ability ranking (CAR)-based TDC method to improve the performance of a KNN classifier, namely CAR-based TDC method. The proposed classification ability function ranks a training sample in terms of its contribution to correctly classify other training samples as a KNN through the leave-one-out (LV1) strategy in the cleaning stage. The training sample that likely misclassifies the other samples during the KNN classifications according to the LV1 strategy is considered to have lower classification ability and will be cleaned out from the original training data set. Extensive experiments, based on ten real-world data sets, show that the proposed CAR-based TDC method can significantly reduce the classification error rates of KNN-based classifiers, while reducing computational complexity thanks to a smaller cleaned training data set.
科研通智能强力驱动
Strongly Powered by AbleSci AI