计算机科学
模态(人机交互)
人工智能
模式
代表(政治)
自然语言处理
过程(计算)
生成语法
情绪分析
变压器
机器学习
物理
社会学
政治学
电压
法学
操作系统
政治
量子力学
社会科学
作者
Ronghao Lin,Haifeng Hu
标识
DOI:10.1109/tmm.2023.3303711
摘要
In the field of multimodal machine learning, multimodal sentiment analysis task has been an active area of research. The predominant approaches focus on learning efficient multimodal representations containing intra- and inter-modality information. However, the heterogeneous nature of different modalities brings great challenges to multimodal representation learning. In this paper, we propose a multi-stage fusion framework to dynamically fine-tune multimodal representations via a hybrid-modal attention mechanism. Previous methods mostly only fine-tune the textual representation due to the success of large corpus pre-trained models and neglect the inconsistency problem of different modality spaces. Thus, we design a module called the Multimodal Shifting Gate (MSG) to fine-tune the three modalities by modeling inter-modality dynamics and shifting representations. We also adopt a module named Masked Bimodal Adjustment (MBA) on the textual modality to improve the inconsistency of parameter spaces and reduce the modality gap. In addition, we utilize syntactic-level and semantic-level textual features output from different layers of the Transformer model to sufficiently capture the intra-modality dynamics. Moreover, we construct a Shifting HuberLoss to robustly introduce the variation of the shifting value into the training process. Extensive experiments on the public datasets, including CMU-MOSI and CMU-MOSEI, demonstrate the efficacy of our approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI