Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning

隐藏字幕 计算机科学 判决 编码器 语音识别 卷积神经网络 自然语言处理 人工智能 解码方法 编码 图像(数学) 算法 生物化学 化学 基因 操作系统
作者
Jingwen Chen,Yingwei Pan,Yehao Li,Ting Yao,Hongyang Chao,Tao Mei
出处
期刊:ACM Transactions on Multimedia Computing, Communications, and Applications [Association for Computing Machinery]
卷期号:19 (1s): 1-24 被引量:12
标识
DOI:10.1145/3539225
摘要

Video captioning has been an emerging research topic in computer vision, which aims to generate a natural sentence to correctly reflect the visual content of a video. The well-established way of doing so is to rely on encoder-decoder paradigm by learning to encode the input video and decode the variable-length output sentence in a sequence-to-sequence manner. Nevertheless, these approaches often fail to produce complex and descriptive sentences as natural as those from human being, since the models are incapable of memorizing all visual contents and syntactic structures in the human-annotated video-sentence pairs. In this article, we uniquely introduce a Retrieval Augmentation Mechanism (RAM) that enables the explicit reference to existing video-sentence pairs within any encoder-decoder captioning model. Specifically, for each query video, a video-sentence retrieval model is first utilized to fetch semantically relevant sentences from the training sentence pool, coupled with the corresponding training videos. RAM then writes the relevant video-sentence pairs into memory and reads the memorized visual contents/syntactic structures in video-sentence pairs from memory to facilitate the word prediction at each timestep. Furthermore, we present Retrieval Augmented Convolutional Encoder-Decoder Network (R-ConvED), which novelly integrates RAM into convolutional encoder-decoder structure to boost video captioning. Extensive experiments on MSVD, MSR-VTT, Activity Net Captions, and VATEX datasets validate the superiority of our proposals and demonstrate quantitatively compelling results.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
脑洞疼应助wu采纳,获得10
1秒前
英俊的铭应助热情高跟鞋采纳,获得10
1秒前
1秒前
111完成签到,获得积分10
1秒前
2秒前
在水一方应助SEANFLY采纳,获得10
2秒前
3秒前
Arthur完成签到 ,获得积分10
3秒前
3秒前
科研通AI5应助gbr0519采纳,获得10
4秒前
5秒前
5秒前
量子星尘发布了新的文献求助50
5秒前
所所应助地狱跳跳虎采纳,获得10
5秒前
无花果应助小丑采纳,获得10
5秒前
所所应助刘威采纳,获得30
6秒前
6秒前
完美世界应助G1997采纳,获得10
6秒前
7秒前
8秒前
幸福镜子发布了新的文献求助10
8秒前
8秒前
8秒前
8秒前
清柠发布了新的文献求助10
9秒前
爆米花应助tangzanwayne采纳,获得10
9秒前
9秒前
胖胖发布了新的文献求助10
9秒前
晴晴发布了新的文献求助10
10秒前
10秒前
wb发布了新的文献求助10
11秒前
11秒前
sunrase发布了新的文献求助10
12秒前
12秒前
qzp发布了新的文献求助10
12秒前
13秒前
KIVA发布了新的文献求助10
13秒前
13秒前
泡泡金运发布了新的文献求助10
13秒前
wu发布了新的文献求助10
13秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
计划经济时代的工厂管理与工人状况(1949-1966)——以郑州市国营工厂为例 500
INQUIRY-BASED PEDAGOGY TO SUPPORT STEM LEARNING AND 21ST CENTURY SKILLS: PREPARING NEW TEACHERS TO IMPLEMENT PROJECT AND PROBLEM-BASED LEARNING 500
The Pedagogical Leadership in the Early Years (PLEY) Quality Rating Scale 410
Why America Can't Retrench (And How it Might) 400
Guidelines for Characterization of Gas Turbine Engine Total-Pressure, Planar-Wave, and Total-Temperature Inlet-Flow Distortion 300
Stackable Smart Footwear Rack Using Infrared Sensor 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 催化作用 遗传学 冶金 电极 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 4604366
求助须知:如何正确求助?哪些是违规求助? 4012767
关于积分的说明 12424858
捐赠科研通 3693390
什么是DOI,文献DOI怎么找? 2036274
邀请新用户注册赠送积分活动 1069311
科研通“疑难数据库(出版商)”最低求助积分说明 953835