Shakespeare Machine: New AI-Based Technologies for Textual Analysis
计算机科学
人工智能
艺术
自然语言处理
作者
Carl Ehrett,Lucian Ghita,Damayanthi Ranwala,Andrew Menezes
出处
期刊:Digital Scholarship in the Humanities [Oxford University Press] 日期:2024-06-04
标识
DOI:10.1093/llc/fqae021
摘要
Abstract This article demonstrates a method using tools from the field of Natural Language Processing (NLP) to aid in analyzing theatrical texts and similar works. The method deploys pre-trained large language model neural networks to gather metadata for a text that is amenable to downstream statistical analyses surfacing patterns of interest in character dialogue. We specifically focus on Shakespeare’s works, collecting metadata in the form of sentiment and emotion scores for each line of his plays. In addition to sentiment and emotion scores produced by NLP models, we also directly gather metadata such as genre, line length, and character gender. We show how these metadata may be used to illuminate a number of interesting patterns in Shakespearean character which may be difficult to detect from a direct reading of the texts. We use these metadata to expose statistically significant relationships in Shakespeare between character gender and the emotional content of that character’s dialogue, controlling for genre. We also present here the publicly available dataset that we have compiled to perform these analyses. The data collects text from Shakespeare’s plays along with a variety of metadata useful for this and other forms of analysis of Shakespeare’s works. The methodology demonstrated here may be extended to other varieties of metadata provided by large NLP models.