计算机科学
情绪识别
语音识别
说话人识别
融合
人机交互
人工智能
语言学
哲学
作者
Bao Zhen Yao,Wuzhen Shi
标识
DOI:10.1109/icassp48485.2024.10447720
摘要
Existing emotion recognition methods in conversations (ERC) focus on using different utterances information between speakers to improve emotion recognition performance, but they ignore the differential contributions of different utterances to emotion recognition. In this paper, we propose a speaker-centric multimodal fusion network for ERC, in which bidirectional gated recurrent units (BiGRU) is used for intra-modal feature fusion and graph convolution is used for speaker-centric cross-modal feature fusion. We construct a speaker-centric graph based on the differences between one speaker's utterances and that of the other speakers. This graph enhances the network's focus on each speaker's own utterance information, effectively reducing interference from other speakers. Simultaneously, we employ a Utterance Distance Attention (UDA) module, tailoring the attention allocation to mitigate the impact of distant utterances on the current utterance. Experimental results on IEMOCAP and MELD demonstrate the effectiveness of our approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI