Mahamat Saleh Adoum Sanoussi,Xiaohua Chen,George K. Agordzo,Mahamed Lamine Guindo,Abdullah MMA Al Omari,Boukhari Mahamat Issa
标识
DOI:10.1109/ccwc54503.2022.9720792
摘要
Identifying hate speech on social media has become increasingly crucial for society. It has been shown that cyberbul-lying significantly affects the social tranquillity of the Chadian population, mainly in places of conflict. This article aims to detect hate speech for texts written in “lingua franca”, a mix of the local Chadian and French languages. The dataset consists of 14,000 comments extracted from the most visited Facebook pages and annotated in four categories (hate, offence, insult and neutral) were used for this study. The data were cleaned by Natural Language Processing techniques (NLP) and applied to three word embedding methods such as Word2Vec, Doc2Vec, and Fasttext. Finally, four Machine Learning methods, namely Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbours (KNN), were computed to classify the different categories. The result showed that FastText features representation as input to SVM classifier was the best with 95.4% accuracy for predicting the comment contained insult statement followed by hate statement 93.9%. The result demonstrated our model could be used to detect the hate speech made by Chadians on social media texts.