过采样
机器学习
过度拟合
计算机科学
人工智能
班级(哲学)
集成学习
Boosting(机器学习)
重采样
背景(考古学)
领域(数学)
二元分类
数据挖掘
支持向量机
人工神经网络
数学
计算机网络
古生物学
带宽(计算)
纯数学
生物
作者
Somiya Abokadr,Azreen Azman,Hazlina Hamdan,N. Nurul Amelina
标识
DOI:10.1109/esmarta59349.2023.10293442
摘要
Imbalanced data significantly impacts the efficacy of machine learning models. In cases where one class greatly outweighs the other in terms of sample count, models might develop a bias towards the majority class, thereby undermining the performance of the minority class. Imbalanced data act to increase the risk of overfitting, as the model may memorize the majority of class samples instead of learning underlying patterns. This paper addresses these challenges in the classification field by exploring various solutions, including under-sampling, oversampling, SMOTE, cost-sensitive learning, and ensemble deep learning methods. The evaluate the performance of these methods on different datasets and provide insights into their strengths and limitations. The paper presents a taxonomy of strategies for imbalanced binary and multi-class classification problems, including resampling, algorithmic, and hybrid methods. Ultimately, the paper furnishes guidelines to facilitate the selection of the most pertinent method for mitigating imbalanced data challenges within a specific classification context.
科研通智能强力驱动
Strongly Powered by AbleSci AI