计算机科学
情绪分析
人工智能
机器学习
重采样
文字嵌入
互联网
自然语言处理
深度学习
嵌入
集成学习
万维网
作者
Nassera Habbat,Nouri Hicham,Houda Anoun,Larbi Hassouni
标识
DOI:10.1016/j.engappai.2023.106999
摘要
The Internet is a crucial way to share information in both personal and professional areas. Sentiment analysis attracts great interest in marketing, research, and business today. The instability faced by imbalanced datasets on sentiment analysis is examined in this research. Balancing the datasets using techniques based on under-sampling and over-sampling is examined to achieve more efficient classification results as the effects of using BERT as word embedding and ensemble learning methods for classification. The effects of the resampling training set algorithms on different deep learning classifiers were investigated using BERT as a word embedding model and Cohen's kappa, accuracy, ROC-AUC curve, and MCC as evaluation metrics with k-fold validation on three sentiment analysis datasets containing English, Arabic, and Moroccan Arabic Dialect texts. Also, we did those performance metrics for all models when scaling the dataset for training and testing, and we calculated the memory and the execution time for each model. Finally, we analyzed the National Office of Railways of Morocco (ONCF) customers' Facebook comments in Modern Standard Arabic (MSA) and MD to determine customer satisfaction as positive, negative, and neutral comments.
科研通智能强力驱动
Strongly Powered by AbleSci AI