过采样
计算机科学
机器学习
人工智能
班级(哲学)
随机森林
领域(数学)
数据挖掘
统计分类
数学
计算机网络
纯数学
带宽(计算)
作者
Saqib Ul Sabha,Assif Assad,Nusrat Mohi Ud Din,Muzafar Rasool Bhat
标识
DOI:10.1109/aisp57993.2023.10134981
摘要
In imbalanced datasets, certain classes have a larger number of samples compared to others, leading to an unequal distribution of samples across the classes. Since many crucial real-world classification problems, like medical diagnosis, involve imbalanced data, the research community places a high priority on understanding how to use this data. If machine learning is performed directly on the imbalanced data, the disparity between the majority and minority classes will cause bias towards the majority class and lead to inaccurate results. There is growing interest in this field of study, and several algorithms have been created. This study aims to evaluate the effectiveness of five oversampling strategies that are intended to address data imbalance, namely random oversampling, SMOTE, borderline SMOTE, ADASYN, and Deep SMOTE. A comparative analysis is carried out, and the effectiveness of each strategy is examined in terms of evaluation metrics. Experimental results demonstrated that DeepSMOTE outperformed all other oversampling techniques on small and imbalanced datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI