粒度
计算机科学
情态动词
模式
模态(人机交互)
计算
变压器
人工智能
算法
工程类
电压
社会学
电气工程
化学
高分子化学
操作系统
社会科学
作者
Weiquan Fan,Xiaofen Xing,Bolun Cai,Xiangmin Xu
标识
DOI:10.1109/icassp49357.2023.10095855
摘要
Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention has become a new trend. However, emotions are presented asynchronously between different modalities, which makes it difficult to interact with emotional information between windows. Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). It addresses the emotional asynchrony and modality misalignment issues through a multi-granularity attention mechanism. Experimental results confirm the effectiveness of our method and the state-of-the-art performance is achieved on IEMOCAP.
科研通智能强力驱动
Strongly Powered by AbleSci AI