聚类分析
计算机科学
可扩展性
数据挖掘
人工智能
特征(语言学)
样品(材料)
兰德指数
高维数据聚类
机器学习
模式识别(心理学)
语言学
色谱法
数据库
哲学
化学
作者
Tian Tian,Ji Wan,Qi Song,Zhi Wei
标识
DOI:10.1038/s42256-019-0037-0
摘要
Single-cell RNA sequencing (scRNA-seq) promises to provide higher resolution of cellular differences than bulk RNA sequencing. Clustering transcriptomes profiled by scRNA-seq has been routinely conducted to reveal cell heterogeneity and diversity. However, clustering analysis of scRNA-seq data remains a statistical and computational challenge, due to the pervasive dropout events obscuring the data matrix with prevailing ‘false’ zero count observations. Here, we have developed scDeepCluster, a single-cell model-based deep embedded clustering method, which simultaneously learns feature representation and clustering via explicit modelling of scRNA-seq data generation. Based on testing extensive simulated data and real datasets from four representative single-cell sequencing platforms, scDeepCluster outperformed state-of-the-art methods under various clustering performance metrics and exhibited improved scalability, with running time increasing linearly with sample size. Its accuracy and efficiency make scDeepCluster a promising algorithm for clustering large-scale scRNA-seq data. Clustering groups of cells in single-cell RNA sequencing datasets can produce high-resolution information for complex biological questions. However, it is statistically and computationally challenging due to the low RNA capture rate, which results in a high number of false zero count observations. A deep learning approach called scDeepCluster, which efficiently combines a model for explicitly characterizing missing values with clustering, shows high performance and improved scalability with a computing time increasing linearly with sample size.
科研通智能强力驱动
Strongly Powered by AbleSci AI