计算机科学
推论
多媒体
萃取(化学)
事件(粒子物理)
人工智能
计算机图形学(图像)
化学
物理
色谱法
量子力学
作者
Fang Liu,Fang Liu,Licheng Jiao,Qianyue Bao,Lei Sun,Shuo Li,Lingling Li,Xu Liu
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-1
标识
DOI:10.1109/tcsvt.2024.3402242
摘要
With the development of multimedia technology, events are usually presented in multimedia forms, thus multimedia event extraction (MEE) has become more and more important. Existing MEE works usually use simple strategies to align two modalities, making it difficult to precisely extract events and arguments in complex multimedia documents. To address this problem, we propose a novel Multi-grained Gradual Inference Model (MGIM) that focuses on inferring and interpreting events in complex multimedia structures in a coarse-to-fine manner. To efficiently integrate textual and visual modalities, we design a Coarse-grained Alignment (CA) module, which represents the two modalities in a graph structure and performs coarse-grained alignment. Based on the CA module, we further propose a Fine-grained Inference module (FI) that fine-grained aligns text and image by performing multiple rounds of gradual inference. MGIM provides a comprehensive interpretation of multimedia events at two information granularities (coarse and fine). Extensive experiments on the M2E2 dataset demonstrate the effectiveness of MGIM.
科研通智能强力驱动
Strongly Powered by AbleSci AI