参考基因组
计算机科学
Lift(数据挖掘)
基因组
集合(抽象数据类型)
软件
计算生物学
顺序装配
数据挖掘
生物
遗传学
基因
程序设计语言
基因表达
转录组
作者
Nae-Chyun Chen,Luis F. Paulin,Fritz J. Sedlazeck,Sergey Koren,Adam M. Phillippy,Ben Langmead
标识
DOI:10.1101/2022.04.27.489683
摘要
Abstract Complete, telomere-to-telomere genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, there is a need to translate genomic features and read alignments between references. Here we describe a new method called levioSAM2 that accounts for reference changes and performs fast and accurate lift-over between assemblies using a whole-genome map. In addition to enabling the use of multiple references, we demonstrate that aligning reads to a high-quality reference (e.g. T2T-CHM13) and lifting to an older reference (e.g. GRCh38) actually improves the accuracy of the resulting variant calls on the old reference. By leveraging the quality improvements of T2T-CHM13, levioSAM2 reduces small-variant calling errors by 11.4-39.5% compared to GRC-based mapping using real Illumina datasets. LevioSAM2 also improves long-read-based structural variant calling and reduces errors from 3.8-11.8% for a PacBio HiFi dataset. Performance is especially improved for a set of complex medically-relevant genes, where the GRC references are lower quality. The software is available at https://github.com/milkschen/leviosam2 under the MIT license.
科研通智能强力驱动
Strongly Powered by AbleSci AI