计算机科学
任务(项目管理)
关系(数据库)
模式
对象(语法)
水准点(测量)
关系抽取
情报检索
关系数据库
领域(数学)
模态(人机交互)
人工智能
编码(集合论)
自然语言处理
信息抽取
数据挖掘
程序设计语言
社会学
经济
集合(抽象数据类型)
管理
纯数学
地理
社会科学
数学
大地测量学
作者
Liang He,Hongke Wang,Yongchang Cao,Zhen Wu,Jianbing Zhang,Xin Dai
标识
DOI:10.1145/3581783.3612209
摘要
Extracting relational facts from multimodal data is a crucial task in the field of multimedia and knowledge graphs that feeds into widespread real-world applications. The emphasis of recent studies centers on recognizing relational facts in which both entities are present in one modality and supplementary information is used from other modalities. However, such works disregard a substantial amount of multimodal relational facts that arise across different modalities, such as one entity seen in a text and another in an image. In this paper, we propose a new task, namely Multimodal Object-Entity Relation Extraction, which aims to extract "object-entity" relational facts from image and text data. To facilitate research on this task, we introduce MORE, a new dataset comprising 21 relation types and 20,136 multimodal relational facts annotated on 3,522 pairs of textual news titles and corresponding images. To show the challenges of Multimodal Object-Entity Relation Extraction, we evaluated recent state-of-the-art methods for multimodal relation extraction and conducted a comprehensive experimentation analysis on MORE. Our results demonstrate significant challenges for existing methods, underlining the need for further research on this task. Based on our experiments, we identify several promising directions for future research. The MORE dataset and code are available at https://github.com/NJUNLP/MORE.
科研通智能强力驱动
Strongly Powered by AbleSci AI