计算机科学
一致性(知识库)
聚类分析
自然语言处理
图形
特征(语言学)
文档聚类
人工智能
情报检索
星团(航天器)
语言学
理论计算机科学
程序设计语言
哲学
作者
Teng Sun,Zhenqiu Shu,Yuxin Huang,Hongbin Wang,Zhengtao Yu
出处
期刊:ACM Transactions on Asian and Low-Resource Language Information Processing
日期:2024-12-19
摘要
Multilingual document clustering (MDC) aims to partition multilingual documents into distinct clusters based on topic categories in an unsupervised manner. However, existing MDC methods still suffer from several limitations in practice tasks. Firstly, most of them optimize multiple objectives within the same feature space, thereby leading to the conflict between learning consistently shared semantics and reconstructing inconsistent view-specific information. Secondly, several methods directly integrate information from multilingual documents during the fusion stage, thereby overlooking the semantic differences between different language features. To address the aforementioned problems, we propose a novel multi-view learning method, called Semantic Feature Graph Consistency with Contrastive Cluster Assignments (SFGC 3 A), for multilingual document clustering. Specifically, the proposed SFGC 3 A method implements consistency objective and reconstruction objective in different feature spaces, thus effectively avoiding conflicts between consistency learning and inconsistency reconstruction. Subsequently, we design the semantic feature graph consistency and semantic label consistency modules to further explore consistent semantic information among multilingual documents, thereby reducing the semantic differences among different language views. Extensive experiments on several multilingual document datasets have shown the effectiveness of the proposed SFGC 3 A method in MDC tasks. The source codes for this work will be released later.
科研通智能强力驱动
Strongly Powered by AbleSci AI