聚类分析
传感器融合
数据挖掘
计算机科学
人工智能
层次聚类
融合
机器学习
哲学
语言学
作者
Sha Tian,Ying Yang,Yushan Qiu,Quan Zou
标识
DOI:10.1109/tcbb.2024.3353335
摘要
Clustering is a common technique for statistical data analysis and is essential for developing precision medicine. Numerous computational methods have been proposed for integrating multi-omics data to identify cancer subtypes. However, most existing clustering models based on network fusion fail to preserve the consistency of the distribution of the data before and after fusion. Motivated by this observation, we would like to measure and minimize the distribution difference between networks, which may not be in the same space, to improve the performance of data fusion. We were therefore motivated to develop a flexible clustering model, based on network fusion, that minimizes the distribution difference between the data before and after fusion by co-regularization; the model can be applied to both single- and multi-omics data. We propose a new network fusion model for single- and multi-omics data clustering for identifying cancer or cell subtypes based on co-regularized network fusion (SMCC). SMCC integrates low-rank subspace representation and entropy to fuse networks. In addition, it measures and minimizes the distribution difference between the similarity networks and the fusion network by co-regularization. The model can both reduce the noise interference in the source data and make the statistical characteristics of the fusion result closer to those of the source data. We evaluated the clustering performance of SMCC across 16 real single- and multi-omics dataset. The experimental results demonstrated that SMCC is superior to 17 state-of-the-art clustering methods. Moreover, it is effective for identifying cancer or cell subtypes, thereby promoting the development of precision medicine.
科研通智能强力驱动
Strongly Powered by AbleSci AI