生命银行
贝叶斯定理
全基因组关联研究
计算机科学
贝叶斯概率
多基因风险评分
机器学习
统计
人工智能
计算生物学
进化生物学
生物
计量经济学
数学
生物信息学
遗传学
单核苷酸多态性
基因型
基因
作者
Haoyu Zhang,Junpeng Zhan,Jin Jin,Thomas U. Ahearn,Zhi Yu,Jared O’Connell,Yunxuan Jiang,Tony Chen,Montserrat García-Closas,Xihong Lin,Bertram L. Koelsch,Nilanjan Chatterjee
标识
DOI:10.1101/2022.03.24.485519
摘要
Polygenic risk scores (PRS) increasingly predict complex traits, however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRS using ancestry-specific GWAS summary statistics from multi-ancestry training samples, integrating clumping and thresholding, empirical Bayes and super learning. We evaluate CT-SLEB and nine-alternatives methods with large-scale simulated GWAS (∼19 million common variants) and datasets from 23andMe Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across thirteen complex traits. Results demonstrate that CT-SLEB significantly improves PRS performance in non-European populations compared to simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offer insights into sample size requirements and SNP density effects on multi-ancestry risk prediction.
科研通智能强力驱动
Strongly Powered by AbleSci AI