插补(统计学)
矩阵分解
计算机科学
聚类分析
数据挖掘
RNA序列
辍学(神经网络)
脚本语言
人工智能
转录组
缺少数据
机器学习
基因
特征向量
基因表达
生物
遗传学
物理
量子力学
操作系统
作者
Jiadi Zhu,Youlong Yang
标识
DOI:10.1142/s0219720023500294
摘要
Single-cell RNA sequencing (scRNA-seq) has been proven to be an effective technology for investigating the heterogeneity and transcriptome dynamics due to the single-cell resolution. However, one of the major problems for data obtained by scRNA-seq is excessive zeros in the count matrix, which hinders the downstream analysis enormously. Here, we present a method that integrates non-negative matrix factorization and transfer learning (NMFTL) to impute the scRNA-seq data. It borrows gene expression information from the additional dataset and adds graph-regularized terms to the decomposed matrices. These strategies not only maintain the intrinsic geometrical structure of the data itself but also further improve the accuracy of estimating the expression values by adding the transfer term in the model. The real data analysis result demonstrates that the proposed method outperforms the existing matrix-factorization-based imputation methods in recovering dropout entries, preserving gene-to-gene and cell-to-cell relationships, and in the downstream analysis, such as cell clustering analysis, the proposed method also has a good performance. For convenience, we have implemented the “NMFTL” method with R scripts, which could be available at https://github.com/FocusPaka/NMFTL.
科研通智能强力驱动
Strongly Powered by AbleSci AI