计算机科学
人工智能
特征(语言学)
模式
对比度(视觉)
多模式学习
自然语言处理
判决
模态(人机交互)
模式识别(心理学)
机器学习
社会科学
哲学
语言学
社会学
作者
Aibo Guo,Xiang Zhao,Zhen Tan,Weidong Xiao
标识
DOI:10.1145/3583780.3614967
摘要
Multimodal Named Entity Recognition (MNER) aims to combine data from different modalities (e.g. text, images, videos, etc.) for recognition and classification of named entities, which is crucial for constructing Multimodal Knowledge Graphs (MMKGs). However, existing researches suffer from two prominant issues: over-reliance on textual features while neglecting visual features, and the lack of effective reduction of the feature space discrepancy of multimodal data. To overcome these challenges, this paper proposes a Multi-Grained Interaction Contrastive Learning framework for MNER task, namely MGICL. MGICL slices data into different granularities, i.e., sentence level/word token level for text, and image level/object level for image. By utilizing multimodal features with different granularities, the framework enables cross-contrast and narrows down the feature space discrepancy between modalities. Moreover, it facilitates the acquisition of valuable visual features by the text. Additionally, a visual gate control mechanism is introduced to dynamically select relevant visual information, thereby reducing the impact of visual noise. Experimental results demonstrate that the proposed MGICL framework satisfactorily tackles the challenges of MNER through enhancing information interaction of multimodal data and reducing the effect of noise, and hence, effectively improves the performance of MNER.
科研通智能强力驱动
Strongly Powered by AbleSci AI