参考基因组
索引
基因组
人类遗传变异
计算生物学
工作流程
外显子组
基因分型
生物
人类基因组
1000基因组计划
外显子组测序
计算机科学
基因组学
管道(软件)
遗传学
单核苷酸多态性
基因
突变
数据库
基因型
程序设计语言
作者
Daniel Valenzuela,Tuukka Norri,Niko Välimäki,Esa Pitkänen,Veli Mäkinen
出处
期刊:BMC Genomics
[Springer Nature]
日期:2018-05-01
卷期号:19 (S2)
被引量:38
标识
DOI:10.1186/s12864-018-4465-8
摘要
Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC . Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.
科研通智能强力驱动
Strongly Powered by AbleSci AI