讽刺
计算机科学
保险丝(电气)
模态(人机交互)
人工智能
模式
情态动词
自编码
过程(计算)
自然语言处理
图像融合
宏
机器学习
图像(数学)
深度学习
语言学
工程类
讽刺
社会科学
哲学
化学
电气工程
社会学
高分子化学
程序设计语言
操作系统
作者
Jie Wang,Yan Yang,Yongquan Jiang,Minbo Ma,Zhuyang Xie,Tianrui Li
标识
DOI:10.1016/j.inffus.2023.102132
摘要
Sarcasm embodies a linguistic phenomenon that highlights a significant incongruity between the literal meanings of words and intended attitudes. With the proliferation of image–text content on social media, the task of multi-modal sarcasm detection (MSD) has gained considerable attention recently. Tremendous progress have been made in developing better MSD models, primarily relying on a straightforward extract-then-fuse paradigm. However, such a setting encounters two potential challenges. First, the utilization of separately pre-trained unimodal models for extracting visual and textual features frequently lacks the fundamental alignment capabilities required for effective multimodal data integration. Second, the detrimental modality gaps between vision and language make it challenging to comprehensively integrate multi-modal information solely via diverse cross-modal fusion techniques. Consequently, this poses a prominent challenge in further capturing cross-modal incongruity and improving the effectiveness of MSD. In this paper, we propose a Multi-modal Mutual Learning (MuMu) network to tackle these issues. Specifically, we initialize the MuMu network with image and text encoders from the large-scale Contrastive Language-Image Pretraining model to enhance the underlying image–text correspondence. Moreover, to improve the capability of capturing cross-modal inconsistency during the fusion process, we design an align-fuse-collaborate mechanism to align disparate modalities before fusion and enhance the collaborative modeling ability between the two modalities with mutual learning after fusion. The proposed MuMu achieves new state-of-the-art results on a public dataset, demonstrating a substantial improvement of approximately 3% to 9% in terms of accuracy, micro-F1, and macro-F1 scores.
科研通智能强力驱动
Strongly Powered by AbleSci AI