基础(证据)
DNA甲基化
计算生物学
DNA
生物
遗传学
计算机科学
历史
基因
考古
基因表达
作者
Lucas Paulo de Lima Camillo,Raghav Sehgal,Judith Armstrong,Albert Higgins‐Chen,Steve Horvath,Bo Wang
标识
DOI:10.1101/2024.10.24.619766
摘要
DNA methylation is a critical epigenetic modification that regulates gene expression and plays a significant role in development and disease processes. Here, we present the Cytosine-phosphate-Guanine Pretrained Transformer (CpGPT), a novel foundation model pretrained on over 1,500 DNA methylation datasets encompassing over 100,000 samples from diverse tissues and conditions. CpGPT leverages an improved transformer architecture to learn comprehensive representations of methylation patterns, allowing it to impute and reconstruct genome-wide methylation profiles from limited input data. By capturing sequence, positional, and epigenetic contexts, CpGPT outperforms specialized models when finetuned for aging-related tasks, including chronological age prediction, mortality risk, and morbidity assessments. The model is highly adaptable across different methylation platforms and tissue types. Furthermore, analysis of sample-specific attention weights enables the identification of the most influential CpG sites for individual predictions. As a foundation model, CpGPT sets a new benchmark for DNA methylation analysis, achieving strong performance in the Biomarkers of Aging Challenge, where it placed second overall in chronological age estimation and first on the public leaderboard in methylation-based mortality prediction.
科研通智能强力驱动
Strongly Powered by AbleSci AI