聚类分析
计算机科学
模式识别(心理学)
人工智能
降维
光谱聚类
特征向量
特征(语言学)
卷积神经网络
嵌入
高维数据聚类
相似性(几何)
图像(数学)
哲学
语言学
作者
Han-Jing Jiang,Ya-Bing Huang,Qianpeng Li
摘要
Limitations of bulk sequencing techniques on cell heterogeneity and diversity analysis have been pushed with the development of single-cell RNA-sequencing (scRNA-seq). To detect clusters of cells is a key step in the analysis of scRNA-seq. However, the high-dimensionality of scRNA-seq data and the imbalances in the number of different subcellular types are ubiquitous in real scRNA-seq data sets, which poses a huge challenge to the single-cell-type detection.We propose a meta-learning-based model, SiaClust, which is the combination of Siamese Convolutional Neural Network (CNN) and improved spectral clustering, to achieve scRNA-seq cell type detection. To be specific, with the help of the constrained Sigmoid kernel, the raw high-dimensionality data is mapped to a low-dimensional space, and the Siamese CNN learns the differences between the cell types in the low-dimensional feature space. The similarity matrix learned by Siamese CNN is used in combination with improved spectral clustering and t-distribution Stochastic Neighbor Embedding (t-SNE) for visualization. SiaClust highlights the differences between cell types by comparing the similarity of the samples, whereas blurring the differences within the cell types is better in processing high-dimensional and imbalanced data. SiaClust significantly improves clustering accuracy by using data generated by nine different species and tissues through different scNA-seq protocols for extensive evaluation, as well as analogies to state-of-the-art single-cell clustering models. More importantly, SiaClust accurately locates the exact site of dropout gene, and is more flexible with data size and cell type.
科研通智能强力驱动
Strongly Powered by AbleSci AI