计算机科学
情绪分析
人工智能
图像(数学)
背景(考古学)
特征(语言学)
情态动词
自然语言处理
模式识别(心理学)
语言学
生物
哲学
古生物学
化学
高分子化学
作者
Tong Zhu,Leida Li,Jufeng Yang,Sicheng Zhao,Hantao Liu,Jiansheng Qian
标识
DOI:10.1109/tmm.2022.3160060
摘要
More and more users are getting used to posting images and text on social networks to share their emotions or opinions. Accordingly, multimodal sentiment analysis has become a research topic of increasing interest in recent years. Typically, there exist affective regions that evoke human sentiment in an image, which are usually manifested by corresponding words in people's comments. Similarly, people also tend to portray the affective regions of an image when composing image descriptions. As a result, the relationship between image affective regions and the associated text is of great significance for multimodal sentiment analysis. However, most of the existing multimodal sentiment analysis approaches simply concatenate features from image and text, which could not fully explore the interaction between them, leading to suboptimal results. Motivated by this observation, we propose a new image-text interaction network (ITIN) to investigate the relationship between affective image regions and text for multimodal sentiment analysis. Specifically, we introduce a cross-modal alignment module to capture region-word correspondence, based on which multimodal features are fused through an adaptive cross-modal gating module. Moreover, considering the complementary role of context information on sentiment analysis, we integrate the individual-modal contextual feature representations for achieving more reliable prediction. Extensive experimental results and comparisons on public datasets demonstrate that the proposed model is superior to the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI