潜在Dirichlet分配
社会化媒体
主题模型
数据科学
计算社会学
背景(考古学)
计算机科学
非负矩阵分解
优势和劣势
领域(数学)
社会学
矩阵分解
人工智能
万维网
心理学
数学
古生物学
物理
特征向量
生物
纯数学
社会心理学
量子力学
作者
Roman Egger,Chung-En Yu
标识
DOI:10.3389/fsoc.2022.886498
摘要
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.
科研通智能强力驱动
Strongly Powered by AbleSci AI