计算机科学
随机森林
采样(信号处理)
数据挖掘
插值(计算机图形学)
集合(抽象数据类型)
机器学习
人工智能
数据集
班级(哲学)
集成学习
基础(拓扑)
过采样
数学
数学分析
运动(物理)
计算机网络
滤波器(信号处理)
带宽(计算)
计算机视觉
程序设计语言
作者
Qinghua Gu,Jingni Tian,Xuexian Li,Song Jiang
标识
DOI:10.1016/j.knosys.2022.109050
摘要
In recent years, most researchers focused on the classification problems of imbalanced data sets, and these problems are widely distributed in industrial production and medical research fields. For these highly imbalanced data sets, the ensemble method based on over-sampling is one of the most competitive techniques in the present research. However, the incorrect sampling strategy easily affected the model performance, which increased the training complexity and caused an over-fitting problem. This article proposed an equilibrium ensemble method (DCI-ISSA) with two novel techniques to conquer these shortcomings. Firstly, this paper raised an over-sampling approach (Data Center Interpolation DCI) to offer a counterbalanced data set for the single learner, which can prevent the base learners from the impact of class imbalance. Additionally, we provided a parameter optimization method for Random Forest (RF), which used the Improved Sparrow Search Algorithm (ISSA) to find the optimal parameters for different imbalanced data sets dynamically. These parameters can improve the classification performance of base classifiers and adjust to all kinds of lopsided data sets with distinct sizes. Experimental results showed that the DCI-ISSA-RF model outperforms other famous approaches for the imbalanced data sets with various dimensions.
科研通智能强力驱动
Strongly Powered by AbleSci AI