计算机科学
代表(政治)
模态(人机交互)
模式
线性子空间
人工智能
空格(标点符号)
自然语言处理
数学
社会科学
几何学
社会学
政治
政治学
法学
操作系统
作者
Hongcheng Shi,Yuanyuan Pu,Zhengpeng Zhao,Jian Huang,Dongming Zhou,Dan Xu,Jinde Cao
标识
DOI:10.1016/j.knosys.2023.111149
摘要
Developing an effective multimodal representation has always been the crux of multimodal sentiment analysis. Different modalities possess distinct sentiment attributes between modality-invariant and modality-specific representation spaces. Prior studies have concentrated on utilizing intricate networks to directly generate joint representations of three modalities and lack exploiting relationships of the two representation spaces. To mitigate this, (1) we introduce a novel framework Co-space Representation Interaction Network (CRNet) that leverages different acoustic and visual representation subspaces to interact with linguistic modality. (2) To construct a joint representation through coordinated acoustic and visual spaces with linguistic modality, a novel module named Gradient-based Representation Enhancement (GRE) is proposed which is effective for extracting significant variation of representation matrices. (3) we design a novel multi-task strategy to optimize the training process to improve the performance of different representation combinations that come from the two spaces. Experimental results demonstrate that our suggested framework achieves state-of-the-art (SOTA) performance on CMU-MOSI, CMU-MOSEI and CH-SIMS datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI