计算机科学
支持向量机
人工智能
文字2vec
自然语言处理
分类器(UML)
语句(逻辑)
社会化媒体
随机森林
机器学习
语音识别
语言学
嵌入
万维网
哲学
作者
Mahamat Saleh Adoum Sanoussi,Xiaohua Chen,George K. Agordzo,Mahamed Lamine Guindo,Abdullah MMA Al Omari,Boukhari Mahamat Issa
标识
DOI:10.1109/ccwc54503.2022.9720792
摘要
Identifying hate speech on social media has become increasingly crucial for society. It has been shown that cyberbul-lying significantly affects the social tranquillity of the Chadian population, mainly in places of conflict. This article aims to detect hate speech for texts written in “lingua franca”, a mix of the local Chadian and French languages. The dataset consists of 14,000 comments extracted from the most visited Facebook pages and annotated in four categories (hate, offence, insult and neutral) were used for this study. The data were cleaned by Natural Language Processing techniques (NLP) and applied to three word embedding methods such as Word2Vec, Doc2Vec, and Fasttext. Finally, four Machine Learning methods, namely Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbours (KNN), were computed to classify the different categories. The result showed that FastText features representation as input to SVM classifier was the best with 95.4% accuracy for predicting the comment contained insult statement followed by hate statement 93.9%. The result demonstrated our model could be used to detect the hate speech made by Chadians on social media texts.
科研通智能强力驱动
Strongly Powered by AbleSci AI