计算机科学
外显子组测序
计算生物学
DNA测序
稳健性(进化)
标准化
原始数据
管道(软件)
参考基因组
数据挖掘
生物信息学
生物
遗传学
基因
突变
操作系统
程序设计语言
作者
Ziyang Li,Shuangsang Fang,Rui Zhang,Lijia Yu,Jiawei Zhang,Dechao Bu,Liang Sun,Yi Zhao,Jinming Li
标识
DOI:10.1016/j.jmoldx.2020.11.010
摘要
Next-generation sequencing is increasingly being adopted as a valuable method for the detection of somatic variants in clinical oncology. However, it is still challenging to reach a satisfactory level of robustness and standardization in clinical practice when using the currently available bioinformatics pipelines to detect variants from raw sequencing data. Moreover, appropriate reference data sets are lacking for clinical bioinformatics pipeline development, validation, and proficiency testing. Herein, we developed the Variant Benchmark tool (VarBen), an open-source software for variant simulation to generate customized reference data sets by directly editing the original sequencing reads. VarBen can introduce a variety of variants, including single-nucleotide variants, small insertions and deletions, and large structural variants, into targeted, exome, or whole-genome sequencing data, and can handle sequencing data from both the Illumina and Ion Torrent sequencing platforms. To demonstrate the feasibility and robustness of VarBen, we performed variant simulation on different sequencing data sets and compared the simulated variants with real-world data. The validation study showed that the simulated data are highly comparable to real-world data and that VarBen is a reliable tool for variant simulation. In addition, our collaborative study of somatic variant calling in 20 laboratories emphasizes the need for laboratories to evaluate their bioinformatics pipelines with customized reference data sets. VarBen may help users develop and validate their bioinformatics pipelines using locally generated sequencing data.
科研通智能强力驱动
Strongly Powered by AbleSci AI