Topic Modeling on Document Networks With Dirichlet Optimal Transport Barycenter

计算机科学 潜在Dirichlet分配 主题模型 可解释性 情报检索 语义学(计算机科学) 杠杆(统计) 人工智能 理论计算机科学 自然语言处理 程序设计语言
作者
Delvin Ce Zhang,Hady W. Lauw
出处
期刊:IEEE Transactions on Knowledge and Data Engineering [Institute of Electrical and Electronics Engineers]
卷期号:36 (3): 1328-1340 被引量:5
标识
DOI:10.1109/tkde.2023.3303465
摘要

Text documents are often interconnected in a network structure, e.g., academic papers via citations, Web pages via hyperlinks. On the one hand, though Graph Neural Networks (GNNs) have shown promising ability to derive effective embeddings for such networked documents, they do not assume a latent topic structure and result in uninterpretable embeddings. On the other hand, topic models can infer semantically interpretable topic distributions for documents by associating each topic with a group of understandable key words. However, most topic models mainly focus on plain text within documents and fail to leverage network structure across documents. Network connectivity reveals topic similarity between linked documents, and modeling it could uncover meaningful semantics. Motivated by above two challenges, in this paper, we propose a GNN-based neural topic model that both captures network connectivity and derives semantically interpretable topic distributions for networked documents. For network modeling, we build the model based on the theory of Optimal Transport Barycenter, which captures network structure by allowing the topic distribution of a document to generate the content of its linked neighbors. For semantic interpretability, we extend optimal transport by incorporating semantically related words in the embedding space. Since Dirichlet prior in Latent Dirichlet Allocation successfully improves topic quality, we also analyze Dirichlet as an optimal transport prior distribution to improve topic interpretability. We design rejection sampling to simulate Dirichlet distribution. Extensive experiments on document classification, clustering, link prediction, and topic analysis verify the effectiveness of our model.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
烂漫剑发布了新的文献求助10
刚刚
iNk应助快乐觅露采纳,获得10
1秒前
聪明摩托发布了新的文献求助10
1秒前
皮皮虾完成签到,获得积分10
1秒前
1秒前
舒昀完成签到,获得积分10
2秒前
scichu发布了新的文献求助10
3秒前
3秒前
setid完成签到 ,获得积分10
4秒前
5秒前
bkagyin应助可爱非笑采纳,获得10
5秒前
酷炫大树完成签到,获得积分10
5秒前
英俊的铭应助yu采纳,获得10
6秒前
Jasper应助啊棕采纳,获得10
6秒前
7秒前
所所应助酒精过敏采纳,获得10
7秒前
FG发布了新的文献求助30
7秒前
自信白梦完成签到,获得积分20
7秒前
科研通AI5应助棉花糖采纳,获得10
7秒前
丘比特应助方董采纳,获得10
8秒前
酱子完成签到 ,获得积分10
8秒前
渡梦不渡身完成签到,获得积分10
9秒前
夏天来了发布了新的文献求助20
9秒前
沿途南行发布了新的文献求助10
9秒前
lani完成签到 ,获得积分10
10秒前
蜜桃吐司完成签到 ,获得积分10
10秒前
乐正成危完成签到 ,获得积分10
10秒前
大个应助Calvin采纳,获得10
11秒前
自信白梦发布了新的文献求助10
11秒前
yangzihua完成签到,获得积分10
11秒前
11秒前
11秒前
SharonDu发布了新的文献求助20
12秒前
向聿发布了新的文献求助10
12秒前
12秒前
欢呼问旋完成签到,获得积分10
12秒前
小二郎应助FG采纳,获得10
13秒前
LRxxx完成签到 ,获得积分10
15秒前
16秒前
syr完成签到,获得积分10
16秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Mechanistic Modeling of Gas-Liquid Two-Phase Flow in Pipes 2500
Structural Load Modelling and Combination for Performance and Safety Evaluation 1000
Conference Record, IAS Annual Meeting 1977 710
電気学会論文誌D(産業応用部門誌), 141 巻, 11 号 510
Virulence Mechanisms of Plant-Pathogenic Bacteria 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3563901
求助须知:如何正确求助?哪些是违规求助? 3137137
关于积分的说明 9421201
捐赠科研通 2837605
什么是DOI,文献DOI怎么找? 1559912
邀请新用户注册赠送积分活动 729212
科研通“疑难数据库(出版商)”最低求助积分说明 717197