索引
计算机科学
INDEL突变
精确性和召回率
结构变异
集合(抽象数据类型)
召回
计算生物学
模式识别(心理学)
人工智能
遗传学
生物
基因
基因组
基因型
单核苷酸多态性
语言学
哲学
程序设计语言
作者
Shunichi Kosugi,Chikashi Terao
标识
DOI:10.1038/s41439-024-00276-x
摘要
Abstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
科研通智能强力驱动
Strongly Powered by AbleSci AI