计算机科学
人工智能
模式
模态(人机交互)
分类
时间戳
不变(物理)
机器学习
模式识别(心理学)
数学
数学物理
社会科学
计算机安全
社会学
作者
Hao Sun,Бо Лю,Yen‐Wei Chen,Lanfen Lin
标识
DOI:10.1016/j.inffus.2022.10.031
摘要
Multimodal sentiment classification is a notable research field that aims to refine sentimental information and classify the sentiment tendency from sequential multimodal data. Most existing sentimental recognition algorithms explore multimodal fusion schemes that achieve good performance. However, there are two key challenges to overcome. First, it is essential to effectively extract inter- and intra-modality features prior to fusion, while simultaneously reducing ambiguity. The second challenge is how to learn modality-invariant representations that capture the underlying similarities. In this paper, we present a modality-invariant temporal learning technique and a new gated inter-modality attention mechanism to overcome these issues. For the first challenge, our proposed gated inter-modality attention mechanism performs modality interactions and filters inconsistencies from multiple modalities in an adaptive manner. We also use parallel structures to learn more comprehensive sentimental information in pairs (i.e., acoustic and visual). In addition, to address the second problem, we treat each modality as a multivariate Gaussian distribution (considering each timestamp as a single Gaussian distribution) and use the KL divergence to capture the implicit temporal distribution-level similarities. These strategies are helpful in reducing domain shifts between different modalities and extracting effective sequential modality-invariant representations. We have conducted experiments on several public datasets (i.e., YouTube and MOUD) and the results show that our proposed method outperforms the state-of-the-art multimodal sentiment categorization methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI