连锁不平衡
全基因组关联研究
生物
计算生物学
遗传关联
遗传学
单核苷酸多态性
联动装置(软件)
聚类分析
核苷酸多型性
关联映射
计算机科学
基因型
基因
机器学习
作者
Zitong Li,Petri Kemppainen,Pasi Rastas,Juha Merilä
标识
DOI:10.1111/1755-0998.12893
摘要
Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster-based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single- and multilocus models that can efficiently conduct the association tests on such high-dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.
科研通智能强力驱动
Strongly Powered by AbleSci AI