计算机科学
情态动词
桥接(联网)
人工智能
突出
变压器
命名实体识别
自然语言处理
实体链接
特征(语言学)
背景(考古学)
任务(项目管理)
模式识别(心理学)
知识库
工程类
电压
古生物学
化学
高分子化学
系统工程
哲学
电气工程
生物
语言学
计算机网络
作者
Xuwu Wang,Jiabo Ye,Zhixu Li,Junfeng Tian,Yong Jiang,Ming Yan,Ji Zhang,Yanghua Xiao
标识
DOI:10.1109/icme52920.2022.9859972
摘要
Multimodal named entity recognition (MNER) aims to detect and classify named entities in multimodal scenarios. It requires bridging the gap between natural language and visual context, which presents two-fold challenges: the cross-modal alignment is diversified, and the cross-modal interaction is sometimes implicit. Existing MNER methods are vulnerable to some implicit interactions and are prone to overlook the involved significant features. To tackle this problem, we novelly propose to refine the cross-modal attention by identifying and highlighting some task-salient features. The saliency of each feature is measured according to its correlation with the expanded entity label words derived from external knowledge bases. We further propose an end-to-end Transformer-based MNER framework, which holds neater architecture yet achieves better performance than previous methods. Extensive experiments are conducted to validate the merits of our method. Moreover, our method reveals a significant advantage in data efficiency and generalization ability.
科研通智能强力驱动
Strongly Powered by AbleSci AI