过采样
计算机科学
随机森林
数据挖掘
人工智能
机器学习
模式识别(心理学)
电信
带宽(计算)
作者
Guoliang Wei,Weimeng Mu,Yan Song,Jun Dou
标识
DOI:10.1016/j.knosys.2022.108839
摘要
Imbalanced data learning has become a major challenge in data mining and machine learning. Oversampling is an effective way to re-achieve the balance by generating new samples. However, most oversampling methods cannot perform well in the presence of noises and complicated distribution structures, very easy to generate redundant/unsafe/outlier samples. To handle this problem, we endeavor to propose a novel oversampling method, namely Improved and Random Synthetic Minority Oversampling Technique (IR-SMOTE). The core idea of IR-SMOTE is three-fold: (1) by applying an ascending operation to sort the majority class samples, noise samples in each cluster of minority class after k-means clustering are successfully removed; (2) the number of synthetic samples is adaptively assigned to each cluster in minority class by means of the kernel density estimation technique; and (3) based on the obtained attributes of the temporary synthetic samples in terms of random-SMOTE, a new synthesizing method is developed to generate new samples with a guaranteed diversity. Finally, many comparison experiments have been carried out on 18 well-known data sets, which illustrate the effectiveness and universal applicability of the proposed IR-SMOTE method for imbalanced data classification.
科研通智能强力驱动
Strongly Powered by AbleSci AI