聚类分析
计算机科学
可视化
鸟枪蛋白质组学
软件
数据挖掘
星团(航天器)
光谱聚类
成对比较
图形处理单元
计算科学
人工智能
并行计算
化学
蛋白质组学
操作系统
基因
生物化学
作者
Paul Ka Po To,Long Wu,Chak Ming Chan,Ayman Hoque,Henry Lam
标识
DOI:10.1021/acs.jproteome.1c00485
摘要
Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.
科研通智能强力驱动
Strongly Powered by AbleSci AI