Improving mortality prediction in Acute Pancreatitis by machine learning and data augmentation

计算机科学机器学习急性胰腺炎人工智能胰腺炎医学重症监护医学内科学

作者

M Asad Bin Hameed,Zareen Alamgir

出处

期刊：Computers in Biology and Medicine [Elsevier]
日期：2022-09-11 卷期号：150: 106077-106077 被引量：34

链接

nih.govdoi.org

标识

DOI：10.1016/j.compbiomed.2022.106077

摘要

Acute Pancreatitis (AP) is the inflammation of the pancreas that can be fatal or lead to further complications based on the severity of the attack. Early detection of AP disease can help save lives by providing utmost care, rigorous treatment, and better resources. In this era of data and technology, instead of relying on manual scoring systems, scientists are employing advanced machine learning and data mining models for the early detection of patients with high chances of mortality. The current work on AP mortality prediction is negligible, and the few studies that exist have many shortcomings and are impractical for clinical deployment. In this research work, we tried to overcome the existing issues. One main issue is the lack of high-quality public datasets for AP, which are crucial for effectively training ML models. The available datasets are small in size, have many missing values, and suffer from high class imbalance. We augmented three public datasets, MIMIC-III, MIMIC-IV, and eICU, to obtain a larger dataset, and experiments proved that augmented data trained classifiers better than original small datasets. Moreover, we employed emerging advanced techniques to handle underlying issues in data. The results showed that iterative imputer is best for filling missing values in AP data. It beats not only the basic techniques but also the Knn-based imputation. Class imbalance is first addressed using data downsampling; apparently, it gave decent results on small test sets. However, we conducted numerous experiments on large test sets to prove that downsampling in the case of AP produced misleading and poor results. Next, we applied various techniques to upsample data in two different class splits, a 50 to 50 and a 70 to 30 majority-minority class split. Four different tabular generative adversarial networks, CTGAN, TGAN, CopulaGAN, and CTAB, and a variational autoencoder, TVAE, were deployed for synthetic data generation. SMOTE was also utilized for data upsampling. The computational results showed that the Random Forest (RF) classifier outperformed all other classifiers on a 50 to 50 class split data generated by CTGAN, with 0.702 Fβ and 0.833 recall. Results produced by RF on the TVAE dataset were also comparable, with 0.698 Fβ. In the case of SMOTE-based upsampling, DNN performed best with a 0.671 Fβ score.

求助该文献

最长约 10秒，即可获得该文献文件

Improving mortality prediction in Acute Pancreatitis by machine learning and data augmentation

今日热心研友