主成分分析
降维
计算机科学
样品(材料)
多维标度
投影(关系代数)
非线性降维
随机投影
转录组
维数之咒
样本量测定
数据挖掘
模式识别(心理学)
计算生物学
人工智能
数学
统计
生物
算法
遗传学
机器学习
化学
色谱法
基因表达
基因
作者
Yang Yang,Hongjian Sun,Yu Zhang,Tiefu Zhang,Jialei Gong,Yunbo Wei,Yonggang Duan,Minglei Shu,Yuchen Yang,Di Wu,Di Yu
出处
期刊:Cell Reports
[Elsevier]
日期:2021-07-01
卷期号:36 (4): 109442-109442
被引量:83
标识
DOI:10.1016/j.celrep.2021.109442
摘要
Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI