过度拟合
DNA甲基化
人工神经网络
相关性
人工智能
计算机科学
表观遗传学
机器学习
数学
生物
遗传学
基因
几何学
基因表达
作者
Lechuan Li,Chonghao Zhang,Shiyu Liu,Hannah Guan,Yu Zhang
标识
DOI:10.1109/tcbb.2021.3084596
摘要
Aging is traditionally thought to be caused by complex and interacting factors such as DNA methylation. The traditional formula of DNA methylation aging is based on linear models and little work has explored the effectiveness of neural networks, which can learn non-linear relationships. DNA methylation data typically consists of hundreds of thousands of feature space and a much less number of biological samples. This leads to overfitting and a poor generalization of neural networks. We propose Correlation Pre-Filtered Neural Network (CPFNN) that uses Spearman Correlation to pre-filter the input features before feeding them into neural networks. We compare CPFNN with the statistical regressions (i.e., Horvath's and Hannum's formulas), the neural networks with LASSO regularization and elastic net regularization, and the Dropout Neural Networks. CPFNN outperforms these models by at least 1 year in term of Mean Absolute Error (MAE), with a MAE of 2.7 years. We also test for association between the epigenetic age with Schizophrenia and Down Syndrome ( p=0.024 and , respectively). We discover that for a large number of candidate features, such as genome-wide DNA methylation data, a key factor in improving prediction accuracy is to appropriately weight features that are highly correlated with the outcome of interest.
科研通智能强力驱动
Strongly Powered by AbleSci AI