可解释性
注释
计算机科学
稳健性(进化)
人工智能
编码器
机器学习
深度学习
数据类型
过度拟合
人工神经网络
基因
生物
生物化学
程序设计语言
操作系统
作者
Fan Yang,Wenchuan Wang,Fang Wang,Yuan Fang,Duyu Tang,Junzhou Huang,Hui Lü,Jianhua Yao
标识
DOI:10.1101/2021.12.05.471261
摘要
Abstract Annotating cell types based on the single-cell RNA-seq data is a prerequisite for researches on disease progress and tumor microenvironment. Here we show existing annotation methods typically suffer from lack of curated marker gene lists, improper handling of batch effect, and difficulty in leveraging the latent gene-gene interaction information, impairing their generalization and robustness. We developed a pre-trained deep neural network-based model scBERT (single-cell Bidirectional Encoder Representations from Transformers) to overcome the challenges. Following BERT’s approach of pre-train and fine-tune, scBERT obtains a general understanding of gene-gene interaction by being pre-trained on huge amounts of unlabeled scRNA-seq data and is transferred to the cell type annotation task of unseen and user-specific scRNA-seq data for supervised fine-tuning. Extensive and rigorous benchmark studies validated the superior performance of scBERT on cell type annotation, novel cell type discovery, robustness to batch effect, and model interpretability.
科研通智能强力驱动
Strongly Powered by AbleSci AI