Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning

计算机科学 隐藏字幕 叙述的 粒度 任务(项目管理) 人工智能 自然语言处理 人机交互 多媒体 图像(数学) 操作系统 哲学 语言学 管理 经济
作者
Yichao Yan,Ning Zhuang,Bingbing Ni,Jian Zhang,Minghao Xu,Qiang Zhang,Zheng Zhang,Shuo Cheng,Qi Tian,Yi Xu,Xiaokang Yang,Wenjun Zhang
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [Institute of Electrical and Electronics Engineers]
卷期号:44 (2): 666-683 被引量:18
标识
DOI:10.1109/tpami.2019.2946823
摘要

Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
跳跃若风发布了新的文献求助10
1秒前
1秒前
2秒前
Yutound完成签到 ,获得积分10
3秒前
4秒前
5秒前
mia发布了新的文献求助10
5秒前
小鱼完成签到,获得积分10
5秒前
蒋蒋蒋发布了新的文献求助10
6秒前
积极的迎彤完成签到,获得积分20
6秒前
鱼小汤圆发布了新的文献求助10
7秒前
7秒前
BeSt完成签到,获得积分10
8秒前
ddj发布了新的文献求助10
8秒前
9秒前
gggg完成签到,获得积分10
9秒前
雨一直下完成签到,获得积分10
9秒前
9秒前
Zehn发布了新的文献求助10
10秒前
迟大猫应助鱼小汤圆采纳,获得10
12秒前
Jasper应助kuai0Yu采纳,获得10
12秒前
cllcx发布了新的文献求助10
13秒前
JamesPei应助眼睛大傲之采纳,获得10
13秒前
May发布了新的文献求助10
13秒前
15秒前
所所应助222采纳,获得10
16秒前
研友_VZG7GZ应助BLAZe采纳,获得10
17秒前
酷波er应助111采纳,获得10
17秒前
17秒前
20秒前
20秒前
万能图书馆应助April采纳,获得10
20秒前
小马甲应助Zehn采纳,获得10
20秒前
22秒前
科研通AI5应助专注的大米采纳,获得30
23秒前
HJX完成签到 ,获得积分10
23秒前
俊逸的水蓝完成签到,获得积分10
23秒前
程与鱼发布了新的文献求助10
24秒前
共享精神应助跳跃若风采纳,获得10
24秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Mechanistic Modeling of Gas-Liquid Two-Phase Flow in Pipes 2500
Structural Load Modelling and Combination for Performance and Safety Evaluation 800
Conference Record, IAS Annual Meeting 1977 610
Interest Rate Modeling. Volume 3: Products and Risk Management 600
Interest Rate Modeling. Volume 2: Term Structure Models 600
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3555160
求助须知:如何正确求助?哪些是违规求助? 3130863
关于积分的说明 9388950
捐赠科研通 2830329
什么是DOI,文献DOI怎么找? 1555932
邀请新用户注册赠送积分活动 726345
科研通“疑难数据库(出版商)”最低求助积分说明 715734