潜在Dirichlet分配
主题模型
计算机科学
人工智能
词汇
自然语言处理
集合(抽象数据类型)
概率潜在语义分析
生成语法
情报检索
概率逻辑
词(群论)
任务(项目管理)
先验概率
领域(数学分析)
生成模型
语义学(计算机科学)
贝叶斯概率
经济
哲学
数学分析
管理
程序设计语言
语言学
数学
作者
Bahareh Harandizadeh,Priniski, J. Hunter,Fred Morstatter
出处
期刊:Cornell University - arXiv
日期:2022-02-11
标识
DOI:10.1145/3488560.3498518
摘要
By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA), describe how words in documents are generated via a set of latent distributions called topics. Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics. As LDA and its extensions are unsupervised models, they aren't defined to make efficient use of a user's prior knowledge of the domain. To this end, we propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors over the vocabulary. Using both quantitative metrics and human responses on a topic intrusion task, we demonstrate that KeyETM produces better topics than other guided, generative models in the literature.
科研通智能强力驱动
Strongly Powered by AbleSci AI