计算机科学
可扩展性
聚类分析
质量细胞仪
单细胞分析
图形
数据挖掘
细胞
人工智能
表型
理论计算机科学
生物
数据库
遗传学
生物化学
基因
作者
Shobana V. Stassen,Dickson M. D. Siu,Kelvin C. M. Lee,Joshua W. K. Ho,Hayden Kwok‐Hay So,Kevin K. Tsia
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2020-01-23
卷期号:36 (9): 2778-2786
被引量:88
标识
DOI:10.1093/bioinformatics/btaa042
摘要
Abstract Motivation New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. Results We introduce a highly scalable graph-based clustering algorithm PARC—Phenotyping by Accelerated Refined Community-partitioning—for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. Availability and implementation https://github.com/ShobiStassen/PARC. Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI