表型
I类和II类错误
全基因组关联研究
层次聚类
单核苷酸多态性
SNP公司
生命银行
关联测试
遗传关联
聚类分析
计算生物学
计算机科学
生物
遗传学
统计
数学
基因型
基因
作者
Hongjing Xie,Xuewei Cao,Shuanglin Zhang,Qiuying Sha
摘要
Abstract In genome‐wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case‐control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case‐control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case‐control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.
科研通智能强力驱动
Strongly Powered by AbleSci AI