聚类分析
插补(统计学)
计算机科学
数据挖掘
人工智能
辍学(神经网络)
层次聚类
模式识别(心理学)
机器学习
缺少数据
作者
Junlin Xu,Lingyu Cui,Jujuan Zhuang,Yajie Meng,Pingping Bing,Binsheng He,Geng Tian,Choi Kwok Pui,Taoyang Wu,Bing Wang,Jialiang Yang
标识
DOI:10.1016/j.compbiomed.2022.105697
摘要
Recent advances in single-cell RNA sequencing (scRNA-seq) provide exciting opportunities for transcriptome analysis at single-cell resolution. Clustering individual cells is a key step to reveal cell subtypes and infer cell lineage in scRNA-seq analysis. Although many dedicated algorithms have been proposed, clustering quality remains a computational challenge for scRNA-seq data, which is exacerbated by inflated zero counts due to various technical noise. To address this challenge, we assess the combinations of nine popular dropout imputation methods and eight clustering methods on a collection of 10 well-annotated scRNA-seq datasets with different sample sizes. Our results show that (i) imputation algorithms do typically improve the performance of clustering methods, and the quality of data visualization using t-Distributed Stochastic Neighbor Embedding; and (ii) the performance of a particular combination of imputation and clustering methods varies with dataset size. For example, the combination of single-cell analysis via expression recovery and Sparse Subspace Clustering (SSC) methods usually works well on smaller datasets, while the combination of adaptively-thresholded low-rank approximation and single-cell interpretation via multikernel learning (SIMLR) usually achieves the best performance on larger datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI