计算机科学
交叉模态
模式
模态(人机交互)
情绪分析
钥匙(锁)
融合
代表(政治)
信息融合
过程(计算)
人工智能
自然语言处理
机器学习
神经科学
心理学
感知
哲学
社会学
操作系统
政治
视觉感受
法学
语言学
计算机安全
社会科学
政治学
作者
Changqin Huang,Junling Zhang,Xuemei Wu,Yi Wang,Ming Li,Xiaodi Huang
标识
DOI:10.1016/j.knosys.2023.110502
摘要
Multimodal sentiment analysis (MSA), which goes beyond the analysis of texts to include other modalities such as audio and visual data, has attracted a significant amount of attention. An effective fusion of sentiment information in multiple modalities is key to improving the performance of MSA. However, aligning multiple modalities during the process of fusion faces challenges such as maintaining modal-specific information. This paper proposes a Text-centered Fusion Network with crossmodal Attention (TeFNA), a multimodal fusion network that uses crossmodal attention to model unaligned multimodal timing information. In particular, TeFNA employs a Text-Centered Aligned fusion method (TCA) that takes text modality as the primary modality to improve the representation of fusion features. In addition, TeFNA maximizes the mutual information between modality pairs to maintain task-related emotional information, thereby ensuring that the key information of modalities from input to fusion is preserved. The results of our comprehensive experiments on the multimodal datasets of CMU-MOSI and CMU-MOSEI show that our proposed model outperforms methods in terms of most metrics used.
科研通智能强力驱动
Strongly Powered by AbleSci AI