变化(天文学)
可扩展性
结构变异
计算机科学
参考基因组
人口
基因组
DNA测序
生物
理论计算机科学
算法
遗传学
DNA
基因
数据库
物理
社会学
人口学
天体物理学
作者
Erik Garrison,Jouni Sirén,Adam M. Novak,Glenn Hickey,Jordan M. Eizenga,Eric T. Dawson,William E. Jones,Shilpa Garg,Charles Markello,Michael Lin,Benedict Paten,Richard Durbin
摘要
Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.
科研通智能强力驱动
Strongly Powered by AbleSci AI