Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering

计算机科学 答疑 图形 人工智能 事件(粒子物理) 自然语言处理 理论计算机科学 情报检索 物理 量子力学
作者
Ziyi Bai,Ruiping Wang,Difei Gao,Xilin Chen
出处
期刊:IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
卷期号:: 1-1 被引量:1
标识
DOI:10.1109/tip.2024.3358726
摘要

Video question answering (VideoQA) is challenging since it requires the model to extract and combine multi-level visual concepts from local objects to global actions from complex events for compositional reasoning. Existing works represent the video with fixed-duration clip features that make the model struggle in capturing the crucial concepts in multiple granularities. To overcome this shortcoming, we propose to represent the video with an Event Graph in a hierarchical structure whose nodes correspond to visual concepts of different levels (object, relation, scene and action) and edges indicate their spatial-temporal relationships. We further propose a H ierarchical S patial- T emporal T ransformer (HSTT) which takes nodes from the graph as visual input to realize compositional reasoning guided by the event graph. To fully exploit the spatial-temporal context delivered from the graph structure, on the one hand, we encode the nodes in the order of their semantic hierarchy (depth) and occurrence time (breadth) with our improved graph search algorithm; On the other hand, we introduce edge-guided attention to combine the spatial-temporal context among nodes according to their edge connections. HSTT then performs QA by cross-modal interactions guaranteed by the hierarchical correspondence between the multi-level event graph and the cross-level question. Experiments on the recent challenging AGQA and STAR datasets show that the proposed method clearly outperforms the existing VideoQA models by a large margin, including those pre-trained with large-scale external data. Our code is available at https://github.com/ByZ0e/HSTT.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
景清完成签到,获得积分10
刚刚
羊_完成签到,获得积分10
刚刚
xy发布了新的文献求助10
刚刚
刚刚
细腻听白发布了新的文献求助10
2秒前
传奇3应助南烛采纳,获得10
2秒前
青草木完成签到,获得积分20
2秒前
lzjz发布了新的文献求助10
2秒前
2秒前
2秒前
BY完成签到,获得积分10
2秒前
谢大喵发布了新的文献求助50
3秒前
幼儿园老大完成签到 ,获得积分10
3秒前
3秒前
wssy应助帅锅采纳,获得10
3秒前
4秒前
5秒前
科研通AI6.3应助GONG采纳,获得30
5秒前
Master发布了新的文献求助10
5秒前
Jasper应助lxy采纳,获得10
6秒前
zy95282完成签到,获得积分20
6秒前
6秒前
王木木完成签到,获得积分10
7秒前
7秒前
从容友安发布了新的文献求助10
8秒前
栀晚发布了新的文献求助10
8秒前
xq关闭了xq文献求助
8秒前
知友关注了科研通微信公众号
9秒前
ye由于求助违规,被管理员扣积分60
9秒前
田様应助安然采纳,获得10
10秒前
CHEN发布了新的文献求助10
11秒前
xxzw完成签到 ,获得积分10
11秒前
11秒前
手机应助boring采纳,获得10
12秒前
Hanoi347应助俊逸的妙竹采纳,获得10
12秒前
13秒前
英姑应助科研小白采纳,获得10
13秒前
14秒前
14秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Lewis’s Child and Adolescent Psychiatry: A Comprehensive Textbook Sixth Edition 2000
Wolffs Headache and Other Head Pain 9th Edition 1000
Continuing Syntax 1000
Signals, Systems, and Signal Processing 510
Austrian Economics: An Introduction 400
中国公共管理案例库案例《一梯之遥的高度》 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6226189
求助须知:如何正确求助?哪些是违规求助? 8051190
关于积分的说明 16787444
捐赠科研通 5309583
什么是DOI,文献DOI怎么找? 2828430
邀请新用户注册赠送积分活动 1806143
关于科研通互助平台的介绍 1665120