计算机科学
人工智能
自然语言处理
任务(项目管理)
答疑
情态动词
图形
情报检索
理论计算机科学
经济
化学
管理
高分子化学
作者
Jie Wang,Yan Yang,Keyu Liu,Zengwei Zhu,Xiaorong Liu
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:31: 111-120
被引量:3
标识
DOI:10.1109/taslp.2022.3221017
摘要
Multi-modal Named Entity Recognition (MNER), which mainly focuses on enhancing text-only NER with visual information, has recently attracted considerable attention. Most current MNER models have made significant progress by jointly understanding visual and language modalities through layers of cross-modality attention. However, these approaches largely ignore the visual bias brought by the image contents and barely consider exploiting the multi-granularity representations and the interactions between visual objects, which are essential in recognizing ambiguous entities. In this paper, we propose a S cene graph driven M ulti-modal M ulti-granularity M ulti-task learning ( M3S ) framework to better exploit visual and textual information in MNER. Specifically, to explicitly alleviate visual bias, we present a novel multi-task approach by employing the task of Named Entity Segmentation (NES) cascade with Named Entity Categorization (NEC). To obtain detailed visual semantics by explicitly modeling objects and relationships between paired objects, we construct scene graphs as a structured representation of the visual contents. Furthermore, a well-designed Multi-granularity Gated Aggregation (MGA) mechanism is introduced to capture inter-modality interactions and extract critical features for named entity recognition. Extensive experiments on two real public datasets demonstrate the effectiveness of our proposed M3S.
科研通智能强力驱动
Strongly Powered by AbleSci AI