过度拟合
机器学习
自编码
人工智能
理论(学习稳定性)
计算机科学
疾病
人口
深度学习
医学
人工神经网络
病理
环境卫生
作者
Qing Yang,Sunan Gao,Junfen Lin,Ke Lyu,Zexu Wu,Yuhao Chen,Yinwei Qiu,Yanrong Zhao,Wei Wang,Tianxiang Lin,Huiyun Pan,Ming Chen
标识
DOI:10.1186/s12859-022-04966-7
摘要
Biological age (BA) has been recognized as a more accurate indicator of aging than chronological age (CA). However, the current limitations include: insufficient attention to the incompleteness of medical data for constructing BA; Lack of machine learning-based BA (ML-BA) on the Chinese population; Neglect of the influence of model overfitting degree on the stability of the association results.Based on the medical examination data of the Chinese population (45-90 years), we first evaluated the most suitable missing interpolation method, then constructed 14 ML-BAs based on biomarkers, and finally explored the associations between ML-BAs and health statuses (healthy risk indicators and disease). We found that round-robin linear regression interpolation performed best, while AutoEncoder showed the highest interpolation stability. We further illustrated the potential overfitting problem in ML-BAs, which affected the stability of ML-Bas' associations with health statuses. We then proposed a composite ML-BA based on the Stacking method with a simple meta-model (STK-BA), which overcame the overfitting problem, and associated more strongly with CA (r = 0.66, P < 0.001), healthy risk indicators, disease counts, and six types of disease.We provided an improved aging measurement method for middle-aged and elderly groups in China, which can more stably capture aging characteristics other than CA, supporting the emerging application potential of machine learning in aging research.
科研通智能强力驱动
Strongly Powered by AbleSci AI