Improving topic modeling for literary studies: a hybrid model combined with Word2Vec visualization in the case of Robinson Crusoe
文字2vec
可视化
计算机科学
人工智能
情报检索
嵌入
作者
Haifeng Hui
出处
期刊:Digital Scholarship in the Humanities [Oxford University Press] 日期:2025-02-12
标识
DOI:10.1093/llc/fqaf002
摘要
Abstract Topic modeling techniques, initially developed for the analysis of short texts, often face challenges when applied to literary research due to the complexity of the literary language and length of the text. Algorithms that typically yield clear and distinct topics for concise informative or opinionated texts often produce ambiguous and overlapping results in literary contexts. This article explores the application of one of the most popular topic modeling techniques, latent Dirichlet allocation (LDA), in the analysis of fiction and addresses these central questions regarding the effectiveness and interpretation of LDA topics through a case study of Robinson Crusoe. It proposes combining the Word2Vec method with LDA analysis to render topic modeling results more readable by mapping topics words in a three-dimensional space where semantically related words are placed close to each other. Furthermore, this integrated approach undergoes validation using various children’s editions of the novel and other works by the same author to assess its effectiveness. It is found that the combined method is capable of differentiating subtle changes in children’s editions and other novels. This study highlights the promising potential of LDA in literary research and underscores the importance of visualization techniques for nuanced interpretations of LDA topics.