A harmonized public resource of deeply sequenced diverse human genomes

1000基因组计划 生物 插补(统计学) 索引 基因组 计算生物学 人口 基因组学 数据质量 数据科学 遗传学 进化生物学 数据挖掘 计算机科学 缺少数据 基因 公制(单位) 单核苷酸多态性 机器学习 经济 人口学 社会学 基因型 运营管理
作者
Zan Koenig,Mary T. Yohannes,Lethukuthula L. Nkambule,Xuefang Zhao,Julia K. Goodrich,Heesu Ally Kim,Michael W. Wilson,Grace Tiao,Stephanie P. Hao,Nareh Sahakian,Katherine R. Chao,Mark A. Walker,Yunfei Lyu,Heidi L. Rehm,Benjamin M. Neale,Michael E. Talkowski,Mark J. Daly,Harrison Brand,Konrad J. Karczewski,Elizabeth G. Atkinson,Alicia R. Martin
出处
期刊:Genome Research [Cold Spring Harbor Laboratory]
卷期号:34 (5): 796-809 被引量:14
标识
DOI:10.1101/gr.278378.123
摘要

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
完美世界应助0000采纳,获得30
1秒前
rosexu发布了新的文献求助10
1秒前
爆米花应助sv采纳,获得10
1秒前
1秒前
搞怪网络完成签到,获得积分10
3秒前
3秒前
liudiqiu应助lh采纳,获得10
3秒前
命运的X号发布了新的文献求助10
3秒前
3秒前
满座关注了科研通微信公众号
4秒前
FashionBoy应助侦察兵采纳,获得10
4秒前
4秒前
个性尔槐完成签到,获得积分10
4秒前
esdeath完成签到,获得积分10
4秒前
13504544355完成签到 ,获得积分10
4秒前
陶醉的蜜蜂完成签到 ,获得积分10
4秒前
5秒前
坦率井完成签到,获得积分10
5秒前
5秒前
善学以致用应助代萌萌采纳,获得10
5秒前
5秒前
捉迷藏应助tengli采纳,获得10
5秒前
shirleeyeahe发布了新的文献求助10
5秒前
kunny完成签到,获得积分10
5秒前
5秒前
闻声完成签到,获得积分10
5秒前
zqfxc发布了新的文献求助10
7秒前
zhuxl完成签到,获得积分10
8秒前
威康宇宙完成签到,获得积分10
8秒前
8秒前
9秒前
cchen0902发布了新的文献求助10
9秒前
在水一方应助cmh采纳,获得10
9秒前
一年能吃800篇sci吗完成签到,获得积分10
9秒前
慕青应助ww采纳,获得10
9秒前
9秒前
9秒前
rosexu完成签到,获得积分10
10秒前
jhlz5879完成签到,获得积分10
10秒前
百宝发布了新的文献求助10
10秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Social media impact on athlete mental health: #RealityCheck 1020
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3527849
求助须知:如何正确求助?哪些是违规求助? 3107938
关于积分的说明 9287239
捐赠科研通 2805706
什么是DOI,文献DOI怎么找? 1540033
邀请新用户注册赠送积分活动 716893
科研通“疑难数据库(出版商)”最低求助积分说明 709794