无礼的
计算机科学
自然语言处理
语音识别
人工智能
语言学
工程类
运筹学
哲学
作者
Ahmed Cherif Mazari,Hamza Kheddar
标识
DOI:10.12785/ijcds/130177
摘要
Toxicity and hate speech on social media platforms can lead to cyber-crime, affecting social life on a personal and community level.Therefore, automatic toxicity and hateful content detection are necessary to enhance web content quality and fight against inappropriate speech spread through social media.This need is also a challenge when comments are posted and written in complex languages, such as Arabic, which is recognised for its difficulties and lack of resources.This paper introduces a new dataset for Algerian dialect toxic text detection, whereby we build an annotated multi-label dataset consisting of 14150 comments extracted from Facebook, YouTube and Twitter, and labelled as hate speech, offensive language and cyberbullying.To assess the practical utility of the created annotated dataset, several tests have been conducted using many classification models of traditional machine learning (ML), namely, Random Forest, Naïve Bayes, Linear Support Vector (SVC), Stochastic Gradient Descent (SGD) and Logistic Regression.Furthermore, several assessments have been conducted using Deep Learning (DL) models such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional-LSTM (Bi-LSTM) and Bidirectional-GRU (Bi-GRU).Experimental tests demonstrate the success of the Bi-GRU model, which achieved the highest results for DL classification, with 73.6% Accuracy and 75.8% F1-Score.
科研通智能强力驱动
Strongly Powered by AbleSci AI