德布鲁恩序列
德布鲁因图
基因组
计算机科学
顺序装配
k-mer公司
基因组学
理论计算机科学
基因组
图形
计算生物学
生物
遗传学
数学
基因
组合数学
基因表达
转录组
作者
Barış Ekim,Bonnie Berger,Rayan Chikhi
出处
期刊:Cell systems
[Elsevier]
日期:2021-10-01
卷期号:12 (10): 958-968.e6
被引量:54
标识
DOI:10.1016/j.cels.2021.08.009
摘要
DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of minimizer-space de Bruijn graphs to enable long-read genome assembly. mdBG achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without compromising accuracy. A human genome is assembled in under 10 min using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 min using 1 GB RAM. In addition, we constructed a minimizer-space de Bruijn graph-based representation of 661,405 bacterial genomes, comprising 16 million nodes and 45 million edges, and successfully search it for anti-microbial resistance (AMR) genes in 12 min. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics, and pangenomics. Code for constructing mdBGs is freely available for download at https://github.com/ekimb/rust-mdbg/.
科研通智能强力驱动
Strongly Powered by AbleSci AI