欠采样
过采样
采样(信号处理)
计算机科学
算法
统计分类
人工智能
模式识别(心理学)
数据挖掘
机器学习
数学
计算机网络
计算机视觉
滤波器(信号处理)
带宽(计算)
作者
Ming Zheng,Tong Li,Liping Sun,Taochun Wang,Biao Jie,Weiyi Yang,Mingjing Tang,Changlong Lv
标识
DOI:10.1016/j.knosys.2021.106800
摘要
Imbalanced data are a common phenomenon in both theoretical research and real-world applications. At a data level, standard classification algorithms cannot effectively learn and make predictions from imbalanced data, and this problem is generally solved by using oversampling, undersampling, or hybrid sampling methods. However, most of the current sampling methods use random sampling ratios, and the resulting classification performance can be undesirable and unstable. To obtain satisfactory and stable classification performance, we proposed three algorithms to automatically determine the sampling ratios for oversampling, undersampling, and hybrid sampling methods, based on a genetic algorithm. Experiments were performed to test the algorithms’ effectiveness by utilizing five widely used standard classification algorithms on 14 different imbalanced datasets using two oversampling, two undersampling, and four hybrid sampling methods. The statistical test results showed that for all five standard classification algorithms, sampling methods that used our proposed algorithms achieved the best classification results. Using area under the receiver operating characteristic curve (AUC) as the evaluation metric, it was demonstrated that the proposed algorithms for automatically determining the sampling ratio outperformed the random sampling ratio.
科研通智能强力驱动
Strongly Powered by AbleSci AI