生物
多因子降维法
基因
基因表达
降维
插补(统计学)
计算生物学
遗传学
基因型
单核苷酸多态性
人工智能
计算机科学
缺少数据
机器学习
作者
Rebecca Elyanow,Bianca Dumitrascu,Barbara E. Engelhardt,Benjamin J. Raphael
出处
期刊:Genome Research
[Cold Spring Harbor Laboratory Press]
日期:2020-01-28
卷期号:30 (2): 195-204
被引量:90
标识
DOI:10.1101/gr.251603.119
摘要
Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.
科研通智能强力驱动
Strongly Powered by AbleSci AI