计算机科学
欠采样
过采样
人工智能
模式识别(心理学)
分类器(UML)
支持向量机
阿达布思
数据挖掘
机器学习
带宽(计算)
计算机网络
作者
Pengfei Sun,Zhiping Wang,Liyan Jia,Zhaohui Xu
标识
DOI:10.1016/j.eswa.2023.121848
摘要
In recent years, class-imbalanced learning has become an important branch of machine learning. Synthetic Minority Oversampling Technique (SMOTE) is known as a benchmark method to address imbalanced learning. Although SMOTE performs well on many data, it also has the drawback of generating noisy samples. There are many SMOTE variants to solve this problem. Specifically, these methods are hybrid sampling methods, that is, carrying out an undersampling stage after SMOTE to remove noisy samples. It requires a method that can accurately identify noise to provide reliable performance. In this paper, a hybrid re-sampling method based on SMOTE and a two-layer nearest neighbor classifier (SMOTE-kTLNN) is proposed. SMOTE-kTLNN recognition noise is realized by an Iterative-Partitioning Filter (IPF). Specifically, SMOTE is performed on the original data to balance the data, then the data is divided into n equal parts, establishing kTLNN on each part to predict the whole data. And noisy samples are removed according to the majority voting rule. In the last, the balanced data sets are used to train kNN, AdaBoost, and SVM to verify whether SMOTE-kTLNN is irrelevant to the classifier. The experiment results demonstrate that SMOTE-kTLNN performs better than the comparisons in 25 binary data sets, including Recall, AUC, F1-measure, and G-mean.
科研通智能强力驱动
Strongly Powered by AbleSci AI