计算机科学
主题模型
数据科学
情报检索
自然语言处理
作者
Dongcheng Zhang,Kunpeng Zhang,Yi Yang,David A. Schweidel
标识
DOI:10.25300/misq/2023/17885
摘要
Online knowledge communities (OKCs), such as question-and-answer sites, have become increasingly popular venues for knowledge sharing. Accordingly, it is necessary for researchers and practitioners to develop effective and efficient text analysis tools to understand the massive amount of user-generated content (UGC) on OKCs. Unsupervised topic modeling has been widely adopted to extract human-interpretable latent topics embedded in texts. These identified topics can be further used in subsequent analysis and managerial practices. However, existing generic topic models that assume documents are independent are inappropriate for analyzing OKCs where structural relationships exist between questions and answers. Thus, a new method is needed to fill this research gap. In this study, we propose a new topic model specifically designed for the text in OKCs. We make three primary contributions to the research on topic modeling in this context. First, we build a general and flexible Bayesian framework to explicitly model structural and temporal dependencies among texts. Second, we statistically demonstrate the approximate model inference using mean-field and coordinate ascent algorithms. Third, we showcase the practical value and relative merit of our method via a specific downstream task (i.e., user profiling). The proposed model is illustrated using two real-world datasets from well-known OKCs (i.e., Stack Exchange and Quora), and extensive experiments demonstrate its superiority over several cutting-edge benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI