计算机科学
卷积神经网络
深度学习
人工智能
光谱图
代表(政治)
特征学习
块(置换群论)
分类器(UML)
模式识别(心理学)
语音识别
几何学
政治学
数学
政治
法学
作者
Jiaxing Liu,Zhilei Liu,Longbiao Wang,Lili Guo,Jianwu Dang
标识
DOI:10.1109/icassp40776.2020.9053192
摘要
Convolutional neural network (CNN) based deep representation learning methods for speech emotion recognition (SER) have demonstrated great success. The basic design of CNN restricts the ability to model only local information well. Capsule network (CapsNet) can overcome the shortages of CNNs to capture the shallow global features from the spectrogram, although CapsNet cannot learn the local and deep global information. In this paper, we propose a local-global aware deep representation learning system that mainly includes two modules. One module contains a multi-scale CNN, time- frequency CNN (TFCNN) to learn the local representation. In the other module, we introduce a structure with dense connections of multiple blocks to learn shallow and deep global information. Every block in this structure is a complete CapsNet improved by a new routing algorithm. The local and global representations are fed to the classifier and achieve an absolute increase of at least 4.25% than benchmarks on IEMOCAP.
科研通智能强力驱动
Strongly Powered by AbleSci AI