计算机科学
人工智能
聚类分析
层次聚类
共识聚类
深度学习
机器学习
数据挖掘
模糊聚类
树冠聚类算法
作者
Ruina Bai,Ruizhang Huang,Yanping Chen,Yongbin Qin,Yong Xu,Qinghua Zheng
标识
DOI:10.1016/j.inffus.2024.102507
摘要
Document clustering, a fundamental task in natural language processing, aims to divede large collections of documents into meaningful groups based on their similarities. Multi-view document clustering (MvDC) has emerged as a promising approach, leveraging information from diverse views to improve clustering accuracy and robustness. However, existing multi-view clustering methods suffer from two issues: (1) a lack of inter-relations across documents during consensus semantic learning; (2) the neglect of consensus structure mining in the multi-view document clustering. To address these issues, we propose a Hierarchical Consensus Learning model for Multi-view Document Clustering, termed as MvDC-HCL. Our model incorporates two key modules: The Data-oriented Consensus Semantic Learning (CSeL) module focuses on learning consensus semantics across various views by leveraging a hybrid contrastive consensus objective. The Task-oriented Consensus Structure Clustering (CStC) module employs a gated fusion network and clustering-driven structure contrastive learning to mine consensus structures effectively. Specifically, CSeL module constructs a contrastive consensus learning objective based on intra-sample and inter-sample relationships in multi-view data, aiming to optimize the view semantic representations obtained by the semantic learner. This facilitates consistent semantic learning across various views of the same sample and consistent relationship learning among samples from different views. Then, the learned view semantic representations are fed into the fusion network of CStC to obtain fused sample semantic representations. Together with the view semantic representations, sample-level and view-level clustering structures are derived for consensus structure mining. Additionally, CStC introduces clustering-driven objectives to guide consensus structure mining and achieve consistent clustering results. By hierarchically extracting implicit consensus semantics and structures within multi-view document data and tasks, MvDC-HCL significantly enhances clustering performance. Through comprehensive experiments, we demonstrate that proposed model can consistently perform better over the state-of-the-art methods. Our code is publicly available at https://github.com/m22453/MvDC_HCRL.
科研通智能强力驱动
Strongly Powered by AbleSci AI