计算机科学
卷积神经网络
情绪识别
Python(编程语言)
语音识别
实现(概率)
人工智能
模式识别(心理学)
操作系统
统计
数学
作者
Mohammad Reza Falahzadeh,Edris Zaman Farsa,Ali Harimi,Arash Ahmadi,Ajith Abraham
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:10: 112460-112471
被引量:9
标识
DOI:10.1109/access.2022.3217226
摘要
Due to the high level of precision and remarkable capabilities to solve the intricate problems in industry and academia, convolutional neural networks (CNNs) are presented. Speech emotion recognition is an interesting application for CNNs in the field of audio processing. In this paper, a speech emotion recognition system based on a 3D CNN is suggested to analyze and classify the emotions. In the proposed method, the three-dimensional reconstructed phase spaces of the speech signals were calculated. Then, emotion-related patterns formed in these spaces were converted into 3D tensors. Accordingly, a 3D CNN for speech emotion recognition applied to two datasets, EMO-DB and eNTERFACE05, using a speaker-independent technique achieved 90.40% and 82.20% accuracy, respectively. By employing gender recognition, the accuracy rates on EMO-DB increased to 94.42% and on eNTERFACE05 rose to 88.47%. Realization of the introduced 3D CNN on both Intel CPU and NVIDIA GPU is also explored. The results of the implemented 3D CNN without and with regard to gender recognition show that GPU-based running is faster for the EMO-DB and eNTERFACE05 datasets than CPU-based executions (using Python).
科研通智能强力驱动
Strongly Powered by AbleSci AI