串联(数学)
计算机科学
模式
话语
特征(语言学)
背景(考古学)
任务(项目管理)
融合
情绪识别
语音识别
融合机制
人工智能
模态(人机交互)
特征提取
模式识别(心理学)
数学
古生物学
经济
社会学
管理
哲学
组合数学
脂质双层融合
生物
语言学
社会科学
作者
Devamanyu Hazarika,Sruthi Gorantla,Soujanya Poria,Roger Zimmermann
标识
DOI:10.1109/mipr.2018.00043
摘要
Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We also compare it with traditional fusion methods such as concatenation, outer-product, etc. Analyzed using textual and speech (audio) modalities, our results suggest that the proposed fusion method outperforms others in the context of utterance-level emotion recognition in videos.
科研通智能强力驱动
Strongly Powered by AbleSci AI