机器学习
人工智能
计算机科学
阿达布思
疟疾
随机森林
支持向量机
特征(语言学)
梯度升压
多层感知器
Boosting(机器学习)
过采样
数据挖掘
人工神经网络
医学
哲学
带宽(计算)
免疫学
语言学
计算机网络
作者
You Won Lee,Jae Woo Choi,Eun‐Hee Shin
标识
DOI:10.1016/j.compbiomed.2020.104151
摘要
Rapid diagnosing is crucial for controlling malaria. Various studies have aimed at developing machine learning models to diagnose malaria using blood smear images; however, this approach has many limitations. This study developed a machine learning model for malaria diagnosis using patient information. To construct datasets, we extracted patient information from the PubMed abstracts from 1956 to 2019. We used two datasets: a solely parasitic disease dataset and total dataset by adding information about other diseases. We compared six machine learning models: support vector machine, random forest (RF), multilayered perceptron, AdaBoost, gradient boosting (GB), and CatBoost. In addition, a synthetic minority oversampling technique (SMOTE) was employed to address the data imbalance problem. Concerning the solely parasitic disease dataset, RF was found to be the best model regardless of using SMOTE. Concerning the total dataset, GB was found to be the best. However, after applying SMOTE, RF performed the best. Considering the imbalanced data, nationality was found to be the most important feature in malaria prediction. In case of the balanced data with SMOTE, the most important feature was symptom. The results demonstrated that machine learning techniques can be successfully applied to predict malaria using patient information.
科研通智能强力驱动
Strongly Powered by AbleSci AI