计算机科学
变压器
光谱图
语音识别
卷积神经网络
人工智能
情绪识别
深度学习
可视化
模式识别(心理学)
工程类
电气工程
电压
出处
期刊:Electronics
[MDPI AG]
日期:2023-09-25
卷期号:12 (19): 4034-4034
被引量:4
标识
DOI:10.3390/electronics12194034
摘要
The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand and react to human emotions. This study aims to enhance the efficacy of emotion recognition from speech by using dimensionality reduction algorithms for visualization, effectively outlining emotion-specific audio features. As a model for emotion recognition, we propose a new model architecture that combines the bidirectional long short-term memory (BiLSTM)–Transformer and a 2D convolutional neural network (CNN). The BiLSTM–Transformer processes audio features to capture the sequence of speech patterns, while the 2D CNN handles Mel-Spectrograms to capture the spatial details of audio. To validate the proficiency of the model, the 10-fold cross-validation method is used. The methodology proposed in this study was applied to Emo-DB and RAVDESS, two major emotion recognition from speech databases, and achieved high unweighted accuracy rates of 95.65% and 80.19%, respectively. These results indicate that the use of the proposed transformer-based deep learning model with appropriate feature selection can enhance performance in emotion recognition from speech.
科研通智能强力驱动
Strongly Powered by AbleSci AI