计算机科学
抽象
人工智能
人工神经网络
卷积神经网络
语音识别
过程(计算)
灵敏度(控制系统)
模式识别(心理学)
维数之咒
特征提取
任务(项目管理)
认识论
管理
经济
哲学
工程类
操作系统
电子工程
作者
Xin-Cheng Wen,Kunhong Liu,Weiming Zhang,Kai Jiang
标识
DOI:10.1109/icpr48806.2021.9412360
摘要
Speech emotion recognition (SER) is an important and challenging task. It requires that a machine learning model process a person's speech signals and to judge his emotional state accurately. Due to the high dimensionality of the audio data, the extracted features are always noisy. Besides, the abstraction of audio features makes it impossible to fully use the inherent relationship among audio features. This paper proposes a model that combines a convolutional neural network (CNN) and a capsule network (CapsNet), named as CapCNN. The advantage of CapCNN lies in that it provides a solution for time sensitivity, and gives the overall characteristics. In this study, it is found that CapCNN can well handle the SER task. Compared with other state-of-the-art methods, our algorithm shows high performances on the CASIA and EMODB datasets. The detailed analysis confirms that our method provides balanced results on various classes.
科研通智能强力驱动
Strongly Powered by AbleSci AI