鉴定(生物学)
人工智能
可扩展性
机器学习
管道(软件)
数据挖掘
计算机科学
相似性(几何)
工作流程
数据库
生物
植物
图像(数学)
程序设计语言
作者
Xiaobo Sun,Xiaochu Lin,Ziyi Li,Hao Wu
摘要
The cell type identification is among the most important tasks in single-cell RNA-sequencing (scRNA-seq) analysis. Many in silico methods have been developed and can be roughly categorized as either supervised or unsupervised. In this study, we investigated the performances of 8 supervised and 10 unsupervised cell type identification methods using 14 public scRNA-seq datasets of different tissues, sequencing protocols and species. We investigated the impacts of a number of factors, including total amount of cells, number of cell types, sequencing depth, batch effects, reference bias, cell population imbalance, unknown/novel cell type, and computational efficiency and scalability. Instead of merely comparing individual methods, we focused on factors' impacts on the general category of supervised and unsupervised methods. We found that in most scenarios, the supervised methods outperformed the unsupervised methods, except for the identification of unknown cell types. This is particularly true when the supervised methods use a reference dataset with high informational sufficiency, low complexity and high similarity to the query dataset. However, such outperformance could be undermined by some undesired dataset properties investigated in this study, which lead to uninformative and biased reference datasets. In these scenarios, unsupervised methods could be comparable to supervised methods. Our study not only explained the cell typing methods' behaviors under different experimental settings but also provided a general guideline for the choice of method according to the scientific goal and dataset properties. Finally, our evaluation workflow is implemented as a modularized R pipeline that allows future evaluation of new methods. Availability: All the source codes are available at https://github.com/xsun28/scRNAIdent.
科研通智能强力驱动
Strongly Powered by AbleSci AI