计算机科学
可解释性
模式
人工智能
机器学习
代表(政治)
模态(人机交互)
情绪分析
任务(项目管理)
多模式学习
自然语言处理
社会科学
管理
社会学
政治
政治学
法学
经济
作者
Ronghao Lin,Haifeng Hu
标识
DOI:10.1109/taffc.2023.3282410
摘要
In the field of Multimodal Sentiment Analysis (MSA), the prevailing methods are devoted to developing intricate network architectures to capture the intra- and inter-modal dynamics, which necessitates numerous parameters and poses more difficulties in terms of interpretability in multimodal modeling. Besides, the heterogeneous nature of multiple modalities (text, audio, and vision) introduces significant modality gaps, thereby making multimodal representation learning an ongoing challenge. To address the aforementioned issues, by considering the learning process of modalities as multiple subtasks, we propose a novel approach named Multi-Task Momentum Distillation (MTMD) which succeeds in reducing the gap among different modalities. Specifically, according to the abundance of semantic information, we treat the subtasks of textual and multimodal representations as the teacher networks while the subtasks of acoustic and visual representations as the student ones to present knowledge distillation, which transfers the sentiment-related knowledge guided by the regression and classification subtasks. Additionally, we adopt unimodal momentum models to explore modality-specific knowledge deeply and employ adaptive momentum fusion factors to learn a robust multimodal representation. Furthermore, we provide a theoretical perspective of mutual information maximization by interpreting MTMD as generating sentiment-related views in various ways. Extensive experiments illustrate the superiority of our approach compared with the state-of-the-art methods in MSA.
科研通智能强力驱动
Strongly Powered by AbleSci AI