推论
进化生物学
R包
计算机科学
地理
谱系学
人工智能
生物
统计
数学
历史
作者
Jessica Honorato‐Mauer,Nirav N. Shah,Adam X. Maihofer,Clement C. Zai,Síntia Belangero,Caroline M. Nievergelt,Marcos Santoro,Elizabeth G. Atkinson
标识
DOI:10.1101/2024.08.26.609770
摘要
In recent years, significant efforts have been made to improve methods for genomic studies of admixed populations using Local Ancestry Inference (LAI). Accurate LAI is crucial to ensure downstream analyses reflect the genetic ancestry of research participants accurately. Here, we test analytic strategies for LAI to provide guidelines for optimal accuracy, focusing on admixed populations reflective of Latin America's primary continental ancestries - African (AFR), Amerindigenous (AMR), and European (EUR). Simulating LD-informed admixed haplotypes under a variety of 2 and 3-way admixture models, we implemented a standard LAI pipeline, testing three reference panel compositions to quantify their overall and ancestry-specific accuracy. We examined LAI miscall frequencies and true positive rates (TPR) across simulation models and continental ancestries. AMR tracts have notably reduced LAI accuracy as compared to EUR and AFR tracts in all comparisons, with TPR means for AMR ranging from 88-94%, EUR from 96-99% and AFR 98-99%. When LAI miscalls occurred, they most frequently erroneously called European ancestry in true Amerindigenous sites. Using a reference panel well-matched to the target population, even with a lower sample size, LAI produced true-positive estimates that were not statistically different from a high sample size but mismatched reference, while being more computationally efficient. While directly responsive to admixed Latin American cohort compositions, these trends are broadly useful for informing best practices for LAI across other admixed populations. Our findings reinforce the need for inclusion of more underrepresented populations in sequencing efforts to improve reference panels.
科研通智能强力驱动
Strongly Powered by AbleSci AI