过采样
人工智能
欠采样
过度拟合
计算机科学
机器学习
阿达布思
分类器(UML)
随机森林
模式识别(心理学)
随机子空间法
边界判定
统计分类
班级(哲学)
集成学习
人工神经网络
带宽(计算)
计算机网络
作者
Mohammad Sarosh Basit,Adeeba Khan,Omar Farooq,Yusuf Uzzaman Khan,Mohammad Shameem
标识
DOI:10.1109/impact55510.2022.10029111
摘要
An imbalanced dataset with class overlapping is a challenging issue in medical research. Imbalanced data points lead to overfitting for the majority class while overlapped classes cause misclassification for both classes. Hence, this combination makes it challenging for classic machine learning algorithms to define a decision boundary between minority and majority classes. In our study, different algorithms with different techniques have been compared for example oversampling, undersampling, combined over and under sampling, and the ensemble methods to deal with class imbalance along with class overlapping. Two well-known highly imbalanced and overlapped medical datasets are used to compare the performance of different approaches and performance is evaluated by sensitivity and specificity. On the sleep apnea dataset, oversampling combined with ensemble classifier AdaBoost with the specificity and sensitivity of 0.72 and 0.46 which proved better than other techniques and classifiers. On the diabetes dataset, SMOTE-TOMEK oversampling combined with the Random Forrest classifier with the specificity and sensitivity of 0.91 and 0.77 proved to be better than all the combinations that have been tried for the classification with minimal number of features.
科研通智能强力驱动
Strongly Powered by AbleSci AI