A benchmarking study of individual somatic variant callers and voting-based ensembles for whole-exome sequencing

索引 体细胞 标杆管理 外显子组测序 外显子组 计算机科学 投票 计算生物学 机器学习 生物 遗传学 突变 基因 基因型 单核苷酸多态性 营销 政治 法学 业务 政治学
作者
Arnaud Guillé,José Adélaı̈de,Pascal Finetti,Fabrice André,Daniel Birnbaum,Émilie Mamessier,François Bertucci,Max Chaffanet
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
卷期号:26 (1)
标识
DOI:10.1093/bib/bbae697
摘要

Abstract By identifying somatic mutations, whole-exome sequencing (WES) has become a technology of choice for the diagnosis and guiding treatment decisions in many cancers. Despite advances in the field of somatic variant detection and the emergence of sophisticated tools incorporating machine learning, accurately identifying somatic variants remains challenging. Each new somatic variant caller is often accompanied by claims of superior performance compared to predecessors. Furthermore, most comparative studies focus on a limited set of tools and reference datasets, leading to inconsistent results and making it difficult for laboratories to select the optimal solution. Our study comprehensively evaluated 20 somatic variant callers across four reference WES datasets. We subsequently assessed the performance of ensemble approaches by exploring all possible combinations of these callers, generating 8178 and 1013 combinations for single-nucleotide variants (SNVs) and indels, respectively, with varying voting thresholds. Our analysis identified five high-performing individual somatic variant callers: Muse, Mutect2, Dragen, TNScope, and NeuSomatic. For somatic SNVs, an ensemble combining LoFreq, Muse, Mutect2, SomaticSniper, Strelka, and Lancet outperformed the top-performing caller (Dragen) by >3.6% (mean F1 score = 0.927). Similarly, for somatic indels, an ensemble of Mutect2, Strelka, Varscan2, and Pindel outperformed the best individual caller (Neusomatic) by >3.5% (mean F1 score = 0.867). By considering the computational costs of each combination, we were able to identify an optimal solution involving four somatic variant callers, Muse, Mutect2, and Strelka for the SNVs and Mutect2, Strelka, and Varscan2 for the indels, enabling accurate and cost-effective somatic variant detection in whole exome.

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
太阳完成签到,获得积分10
2秒前
Jasper应助元气小粒采纳,获得10
2秒前
周城完成签到,获得积分10
2秒前
天天快乐应助cc采纳,获得10
3秒前
李思发布了新的文献求助10
3秒前
xm发布了新的文献求助10
4秒前
4秒前
han完成签到,获得积分10
4秒前
隐形曼青应助科研民工采纳,获得10
4秒前
大个应助bolt采纳,获得10
4秒前
搜集达人应助鹿小张采纳,获得10
5秒前
医学生完成签到,获得积分10
6秒前
UGO发布了新的文献求助10
7秒前
7秒前
鲤鱼奇遇发布了新的文献求助10
7秒前
xm完成签到,获得积分10
8秒前
9秒前
10秒前
10秒前
11秒前
李健应助la采纳,获得10
12秒前
jiangmj1990发布了新的文献求助10
13秒前
瞿亭龙完成签到,获得积分10
14秒前
暮夏钟鼓发布了新的文献求助10
14秒前
汎影发布了新的文献求助10
14秒前
1113完成签到,获得积分10
14秒前
踏实的银耳汤完成签到,获得积分10
16秒前
认真谷雪发布了新的文献求助10
16秒前
16秒前
肖肖完成签到,获得积分10
16秒前
轻松的万天完成签到 ,获得积分10
17秒前
爱读文献的小郭完成签到 ,获得积分10
17秒前
17秒前
rayzhanghl完成签到,获得积分10
17秒前
ffff完成签到,获得积分10
19秒前
19秒前
在水一方应助太阳采纳,获得10
19秒前
hw完成签到,获得积分10
19秒前
bolt发布了新的文献求助10
20秒前
21秒前
高分求助中
Востребованный временем 2500
Les Mantodea de Guyane 1000
Aspects of Babylonian celestial divination: the lunar eclipse tablets of Enūma Anu Enlil 1000
Very-high-order BVD Schemes Using β-variable THINC Method 930
Field Guide to Insects of South Africa 660
The Three Stars Each: The Astrolabes and Related Texts 500
Separation and Purification of Oligochitosan Based on Precipitation with Bis(2-ethylhexyl) Phosphate Anion, Re-Dissolution, and Re-Precipitation as the Hydrochloride Salt 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3382813
求助须知:如何正确求助?哪些是违规求助? 2997266
关于积分的说明 8773363
捐赠科研通 2682672
什么是DOI,文献DOI怎么找? 1469272
科研通“疑难数据库(出版商)”最低求助积分说明 679344
邀请新用户注册赠送积分活动 671487