计算机科学
聚类分析
瓶颈
超参数
人工智能
数据挖掘
自编码
分类器(UML)
降噪
模式识别(心理学)
深度学习
人工神经网络
预处理器
插补(统计学)
机器学习
噪音(视频)
作者
Hui Li,Cory Brouwer,Weijun Luo
标识
DOI:10.1038/s41467-022-29576-y
摘要
Single cell RNA sequencing (scRNA-Seq) is being widely used in biomedical research and generated enormous volume and diversity of data. The raw data contain multiple types of noise and technical artifacts, which need thorough cleaning. Existing denoising and imputation methods largely focus on a single type of noise (i.e., dropouts) and have strong distribution assumptions which greatly limit their performance and application. Here we design and develop the AutoClass model, integrating two deep neural network components, an autoencoder, and a classifier, as to maximize both noise removal and signal retention. AutoClass is distribution agnostic as it makes no assumption on specific data distributions, hence can effectively clean a wide range of noise and artifacts. AutoClass outperforms the state-of-art methods in multiple types of scRNA-Seq data analyses, including data recovery, differential expression analysis, clustering analysis, and batch effect removal. Importantly, AutoClass is robust on key hyperparameter settings including bottleneck layer size, pre-clustering number and classifier weight. We have made AutoClass open source at: https://github.com/datapplab/AutoClass .
科研通智能强力驱动
Strongly Powered by AbleSci AI