过采样
欠采样
机器学习
人工智能
计算机科学
算法
噪音(视频)
班级(哲学)
自举(财务)
水准点(测量)
数据挖掘
模式识别(心理学)
数学
图像(数学)
计算机网络
大地测量学
带宽(计算)
计量经济学
地理
作者
Alex X. Wang,Stefanka Chukova,Binh P. Nguyen
标识
DOI:10.1016/j.asoc.2023.110895
摘要
Skewed class proportions in real-world datasets present a challenge for machine learning algorithms, as they have a tendency to correctly categorize the majority class while incorrectly classifying the minority class. Such classification disparities hold significant implications, particularly in predictive scenarios involving minority groups, where misclassifying minority instances could lead to adverse outcomes. To tackle this, class imbalance learning has gained attention, with the Synthetic Minority Oversampling Technique (SMOTE) being a notable approach that addresses class imbalance by generating synthetic instances for the minority class based on their feature space neighbors. Despite its effectiveness and simplicity, SMOTE is known to suffer from a noise propagation issue where noisy and uninformative samples are introduced. While various SMOTE variants, including hybrids with undersampling, have been developed to tackle this problem, identifying noisy samples in complex real-world datasets remains a challenge. To address this, our study introduces a new SMOTE-based hybrid approach called SMOTE-centroid displacement-based k-NN (SMOTE-CDNN). SMOTE-CDNN employs centroid displacement for class prediction, which is more robust against noisy data. After SMOTE is applied, noise instances are detected and removed for clearer decision boundaries if their labels predicted by our centroid displacement-based k-NN algorithm are different from the real ones. While our experiments on 24 imbalance datasets demonstrate the resilience and efficiency of our proposed algorithm, which outperforms state-of-art resampling algorithms with various classification models, we acknowledge the need for further investigation into specific dataset characteristics and classification scenarios to determine the generalizability of our approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI