Inference of Population Structure using Dense Haplotype Data

生物 连锁不平衡 推论 人口 聚类分析 可解释性 主成分分析 单倍型 联动装置(软件) 遗传学 进化生物学 计算生物学 数据挖掘 人工智能 计算机科学 人口学 社会学 基因 基因型
作者
Daniel J. Lawson,Garrett Hellenthal,Simon Myers,Daniel Falush
出处
期刊:PLOS Genetics [Public Library of Science]
卷期号:8 (1): e1002453-e1002453 被引量:1137
标识
DOI:10.1371/journal.pgen.1002453
摘要

The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
jinyue完成签到 ,获得积分10
1秒前
司马断秋发布了新的文献求助10
2秒前
幽默大象完成签到 ,获得积分10
3秒前
3秒前
乐空思应助栀子采纳,获得50
3秒前
Pendulium完成签到,获得积分10
4秒前
242588发布了新的文献求助30
4秒前
llx给llx的求助进行了留言
5秒前
6秒前
清爽的易真完成签到,获得积分10
6秒前
6秒前
ys发布了新的文献求助10
6秒前
7秒前
huxuehong完成签到 ,获得积分10
8秒前
SCX发布了新的文献求助10
9秒前
9秒前
9秒前
ggg发布了新的文献求助10
11秒前
开开发布了新的文献求助10
11秒前
小峰完成签到,获得积分20
11秒前
救救孩子吧完成签到,获得积分20
12秒前
PPH发布了新的文献求助10
12秒前
Wzebrafish完成签到,获得积分10
13秒前
Dr.HughZ完成签到,获得积分20
14秒前
GGL完成签到,获得积分10
14秒前
15秒前
HHHHH完成签到,获得积分10
15秒前
15秒前
16秒前
Dr.HughZ发布了新的文献求助30
16秒前
17秒前
18秒前
19秒前
Akim应助不吃香菜采纳,获得10
19秒前
19秒前
NexusExplorer应助鱼丸弹采纳,获得10
19秒前
Atropine发布了新的文献求助10
21秒前
潇洒邴完成签到 ,获得积分20
21秒前
哭泣的芷容完成签到,获得积分10
21秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Burger's Medicinal Chemistry and Drug Discovery 400
A Step-by-Step Guide to Qualitative Data Coding 2nd Edition 400
Impact of Storage Orientation and Duration on Prefilled Syringe Performance: Break-Loose and Glide Forces, and Injection Time Across Multiple Time Points 360
Programming for Chemical Engineers Using C, C++, and MATLAB 300
Upland Kenya wild flowers and ferns: a flora of the flowers, ferns, grasses, and sedges of highland Kenya 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6668024
求助须知:如何正确求助?哪些是违规求助? 8417239
关于积分的说明 17993460
捐赠科研通 5876067
什么是DOI,文献DOI怎么找? 2976728
邀请新用户注册赠送积分活动 1952646
关于科研通互助平台的介绍 1880474