计算机科学
语义学(计算机科学)
加速
流式处理
情报检索
理论计算机科学
并行计算
程序设计语言
作者
Zhiqi Lei,Liu Hai,Jiaxing Yan,Yanghui Rao,Qing Li
出处
期刊:IEEE Transactions on Knowledge and Data Engineering
[Institute of Electrical and Electronics Engineers]
日期:2023-10-01
卷期号:35 (10): 10616-10632
被引量:3
标识
DOI:10.1109/tkde.2023.3267496
摘要
Aiming at mining high quality topics by accumulating and utilizing semantic knowledge for a stream of documents, lifelong topic modeling (LTM) has attracted more and more attentions recently. However, the permutation of topics may change over time, resulting in a semantic misalignment between the topic representations of document chunks across the stream. Such a misalignment deteriorates the model performances of various downstream tasks, while it has been overlooked by the existing lifelong topic models. Towards addressing the misalignment of semantics, we formulate LTM as a problem of non-negative matrix tri-factorization (NMTF) and propose a consolidation framework (i.e., NMTF-LTM) to enforce an alignment in a mapped topic space. In addition, a distributed parallel algorithm, namely PNMTF-LTM, is developed to meet the real-time requirement for large-scale stream processing. Empirical results show that our method can not only obtain a superior alignment of semantics without loss of topic quality, but also achieve effective speedup when deployed to a high performance computing cluster.
科研通智能强力驱动
Strongly Powered by AbleSci AI