计算机科学
聚类分析
数据挖掘
顺序装配
软件
计算生物学
人工智能
程序设计语言
生物
生物化学
基因
基因表达
转录组
作者
Yuansheng Liu,Xiaocai Zhang,Quan Zou,Xiangxiang Zeng
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2020-10-28
卷期号:37 (11): 1604-1606
被引量:15
标识
DOI:10.1093/bioinformatics/btaa915
摘要
Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI