光谱图
计算机科学
卷积神经网络
语音识别
人工智能
深度学习
话语
情绪识别
情感计算
模式识别(心理学)
作者
Shiqing Zhang,Xiaoming Zhao,Qi Tian
出处
期刊:IEEE Transactions on Affective Computing
[Institute of Electrical and Electronics Engineers]
日期:2022-04-01
卷期号:13 (2): 680-688
被引量:82
标识
DOI:10.1109/taffc.2019.2947464
摘要
Recently, emotion recognition in real sceneries such as in the wild has attracted extensive attention in affective computing, because existing spontaneous emotions in real sceneries are more challenging and difficult to identify than other emotions. Motivated by the diverse effects of different lengths of audio spectrograms on emotion identification, this paper proposes a multiscale deep convolutional long short-term memory (LSTM) framework for spontaneous speech emotion recognition. Initially, a deep convolutional neural network (CNN) model is used to learn deep segment-level features on the basis of the created image-like three channels of spectrograms. Then, a deep LSTM model is adopted on the basis of the learned segment-level CNN features to capture the temporal dependency among all divided segments in an utterance for utterance-level emotion recognition. Finally, different emotion recognition results, obtained by combining CNN with LSTM at multiple lengths of segment-level spectrograms, are integrated by using a score-level fusion strategy. Experimental results on two challenging spontaneous emotional datasets, i.e., the AFEW5.0 and BAUM-1s databases, demonstrate the promising performance of the proposed method, outperforming state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI