计算机科学
支持向量机
人工智能
分类器(UML)
社会化媒体
自然语言处理
朴素贝叶斯分类器
意识形态
基线(sea)
自然语言
机器学习
万维网
地质学
海洋学
法学
政治
政治学
作者
Kamalakkannan Ravi,Adan Vela,Rickard Ewetz
标识
DOI:10.1109/icmla55696.2022.00066
摘要
With the long-term goal of understanding how language is used and evolves within online communities, this work explores the application of natural language processing techniques to classify text articles according to their ideological orientation (i.e., conservative or liberal). We first collect a balanced corpus of text articles posted to the online communities r/Liberal and r/Conservative from the social media website Reddit. Using the corpus, we develop and apply three classifiers. The baseline classifier is a Bayes model that accounts for each text article’s web domain, as such, classification is independent of content. Next, we develop a support vector machine (SVM) model with term frequency-inverse document frequency (TF-IDF) features; this approach highlight differences in language using a count-based feature-space to differentiate text articles. Last, we evaluate the context-based transformer (RoBERTa) model and discuss its under-performance relative to the baseline and SVM models.
科研通智能强力驱动
Strongly Powered by AbleSci AI