计算机科学
语音识别
融合
任务(项目管理)
情绪识别
语音活动检测
人工智能
特征(语言学)
语音处理
特征提取
模式识别(心理学)
哲学
语言学
管理
经济
作者
Chenquan Gan,Kexin Wang,Qingyi Zhu,Yong Xiang,Deepak Kumar Jain,Salvador García
出处
期刊:Neurocomputing
[Elsevier]
日期:2023-07-28
卷期号:555: 126623-126623
被引量:4
标识
DOI:10.1016/j.neucom.2023.126623
摘要
Speech, as a necessary way to express emotions, plays a vital role in human communication. With the continuous deepening of research on emotion recognition in human–computer interaction, speech emotion recognition (SER) has become an essential task to improve the human–computer interaction experience. When performing emotion feature extraction of speech, the method of cutting the speech spectrum will destroy the continuity of speech. Besides, the method of using the cascaded structure without cutting the speech spectrum cannot simultaneously extract speech spectrum information from both temporal and spatial domains. To this end, we propose a spatial–temporal parallel network for speech emotion recognition without cutting the speech spectrum. To further mix the temporal and spatial features, we design a novel fusion method (called multiple fusion) that combines the concatenate fusion and ensemble strategy. Finally, the experimental results on five datasets demonstrate that the proposed method outperforms state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI