期刊:Journal of Medical Imaging and Health Informatics [American Scientific Publishers] 日期:2020-02-14卷期号:10 (5): 1069-1075被引量:20
标识
DOI:10.1166/jmihi.2020.3000
摘要
Importance: Diabetes is a chronic disease that can cause long term damage to various parts of the body. To prevent diabetic complications, different attempts integrating machine learning with medicine have been made for building models to predict whether a patient has diabetes or not, but predicting this disease still has room for improvement. Hybrid prediction model presents a novel method and mostly achieve a much better optimal outcome than single classical machine learning algorithms. Objective: To develop a high accuracy model for different onsets of type 2 diabetes prediction. In this way, the integration between clustering and classification techniques can be improved to help detecting diabetes at an earlier stage without deleting observations with missing values and also decrease insignificant features to get the most related features during data collection. Methods: We implement a noise reduction based technique using Kmeans clustering followed by running the Random forest and XGBoost classifiers to extract the unknown hidden features of the dataset and for more accurate results. Results: Prediction accuracy can be observed by benchmarking our model against up-to-date predictive models and common classification algorithms. With an accuracy of 97.53% by 10 fold cross validation, our T2ML model reaches a better accuracy compared with other experiments reported by other researchers in the literature and over various conventional classification algorithms.