Development and Validation of Machine Learning Models for Identifying Prediabetes and Diabetes in Normoglycemia

糖尿病前期人工智能机器学习梯度升压逻辑回归支持向量机接收机工作特性随机森林医学人口糖尿病计算机科学 Boosting（机器学习）范畴变量内分泌学 2型糖尿病环境卫生

作者

Xiaodong Zhang,Weidong Yao,Dawei Wang,Wenqi Hu,Guang Zhang,Yongsheng Zhang

出处

期刊：Diabetes-metabolism Research and Reviews [Wiley]
日期：2024-11-01 卷期号：40 (8)

链接

nih.govdoi.org

标识

DOI：10.1002/dmrr.70003

摘要

ABSTRACT Background Prediabetes and diabetes are both abnormal states of glucose metabolism (AGM) that can lead to severe complications. Early detection of AGM is crucial for timely intervention and treatment. However, fasting blood glucose (FBG) as a mass population screening method may fail to identify some individuals who are actually AGM but with normoglycemia. This study aimed to develop and validate machine learning (ML) models to identify AGM among individuals with normoglycemia using routine health check‐up indicators. Methods According to the American Diabetes Association (ADA) criteria, participants with normoglycemia (FBG ≤ 5.6 mmol/L) were collected from 2019 to 2023, and then divided into AGM and Normal groups using glycosylated haemoglobin (HbA1c) 5.7% as the threshold. Data from 2019 to 2022 were divided into training and internal validation sets at a 7:3 ratio, while data from 2023 were used as the external validation set. Seven ML algorithms—including logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting machine, multilayer perceptron (MLP), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost)—were used to build models for identifying AGM in normoglycemia population. Model performance was evaluated using the area under the receiver operating characteristic curve (auROC) and the precision‐recall curve (auPR). The feature contributions to the optimal model was visualised using the SHapley Additive exPlanations (SHAP). Finally, an intuitive and user‐friendly interactive interface was developed. Results A total of 59,259 participants were finally enroled in this study, and then divided into the training set of 32,810, the internal validation set of 14,060, and the external validation set of 12,389. The Catboost model outperformed the others with auROC of 0.806 and 0.794 for the internal and external validation set, respectively. Age was the most important feature influencing the performance of the CatBoost model, followed by fasting blood glucose, red blood cells, haemoglobin, body mass index, and triglyceride‐glucose. Conclusion A well‐performed ML model to identify AGM in the normoglycemia population was built, offering significant potential for early intervention and treatment of AGM that would otherwise have been missed.

求助该文献

最长约 10秒，即可获得该文献文件

Development and Validation of Machine Learning Models for Identifying Prediabetes and Diabetes in Normoglycemia

今日热心研友