同种类的
全基因组测序
基因组
进化生物学
哈代-温伯格原理
计算生物学
机器学习
计算机科学
生物
计量经济学
人工智能
数学
遗传学
组合数学
基因
等位基因频率
基因型
作者
Derek Shyr,Rounak Dey,Xihao Li,Hufeng Zhou,George M. Weinstock,Steven Buyske,Mark J. Daly,Richard A. Gibbs,Ira M. Hall,Tara C. Matise,Catherine Reeves,Nathan O. Stitziel,Michael C. Zody,Benjamin M. Neale,Xihong Lin
标识
DOI:10.1016/j.ajhg.2024.08.018
摘要
Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.
科研通智能强力驱动
Strongly Powered by AbleSci AI