UniMF: A Unified Multimodal Framework for Multimodal Sentiment Analysis in Missing Modalities and Unaligned Multimodal Sequences

计算机科学模式多模态模式治疗法人工智能情绪分析变压器自然语言处理万维网社会科学量子力学医学物理外科社会学电压

作者

Ruohong Huan,Guowei Zhong,Peng Chen,Ronghua Liang

出处

期刊：IEEE Transactions on Multimedia [Institute of Electrical and Electronics Engineers]
日期：2023-12-04 卷期号：26: 5753-5768 被引量：8

标识

DOI：10.1109/tmm.2023.3338769

摘要

In current multimodal sentiment analysis, aligned and complete multimodal sequences are often crucial. Obtaining complete multimodal data in the real world presents various challenges, and aligning multimodal sequences often requires a significant amount of effort. Unfortunately, most multimodal sentiment analysis methods fail when dealing with missing modalities or unaligned multimodal sequences. To tackle these two challenges simultaneously in a simple and lightweight manner, we present the Unified Multimodal Framework (UniMF). The primary components of UniMF comprise two distinct modules. The first module, Translation Module, translates missing modalities using information from existing modalities. The second module, Prediction Module, uses the attention mechanism to fuse the multimodal information and generate predictions. To enhance the translation performance of the Translation Module, we introduce the Multimodal Generation Mask (MGM) and utilize it to construct the Multimodal Generation Transformer (MGT). The MGT can generate the missing modality while focusing on information from existing modalities. Furthermore, we introduce the Multimodal Understanding Transformer (MUT) in the Prediction Module, which includes the Multimodal Understanding Mask (MUM) and a unique sequence, MultiModalSequence ( MMSeq ), representing a unified multimodality. To assess the performance of UniMF, we perform experiments on four multimodal sentiment datasets, and UniMF attains competitive or state-of-the-art outcomes with fewer learnable parameters. Furthermore, the experimental outcomes signify that UniMF, supported by MGT and MUT - two transformers utilizing special attention mechanisms, can efficiently manage both generating task of missing modalities and understanding task of unaligned multimodal sequences.

求助该文献

最长约 10秒，即可获得该文献文件

UniMF: A Unified Multimodal Framework for Multimodal Sentiment Analysis in Missing Modalities and Unaligned Multimodal Sequences

今日热心研友