生命银行
计算机科学
回归
计算生物学
人口
生物
对比度(视觉)
回归分析
基因组
统计
遗传学
全基因组关联研究
机器学习
人工智能
基因型
数学
基因
社会学
人口学
单核苷酸多态性
作者
Joelle Mbatchou,Leland Barnard,Joshua Backman,Anthony Marcketta,Jack A. Kosmicki,Andrey Ziyatdinov,Christian Benner,Colm O’Dushlaine,Mathew Barber,Boris Boutkov,Lukas Habegger,Manuel A. R. Ferreira,Aris Baras,Jeffrey G. Reid,Gonçalo R. Abecasis,Evan K. Maxwell,Jonathan Marchini
出处
期刊:Nature Genetics
[Springer Nature]
日期:2021-05-20
卷期号:53 (7): 1097-1103
被引量:671
标识
DOI:10.1038/s41588-021-00870-7
摘要
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
科研通智能强力驱动
Strongly Powered by AbleSci AI