Emotional Video Captioning with Vision-based Emotion Interpretation Network

隐藏字幕 计算机科学 人工智能 计算机视觉 口译(哲学) 可视化 情绪识别 自然语言处理 语音识别 图像(数学) 程序设计语言
作者
Peipei Song,Dan Guo,Xun Yang,Shengeng Tang,Meng Wang
出处
期刊:IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
卷期号:: 1-1
标识
DOI:10.1109/tip.2024.3359045
摘要

Effectively summarizing and re-expressing video content by natural languages in a more human-like fashion is one of the key topics in the field of multimedia content understanding. Despite good progress made in recent years, existing efforts usually overlooked the emotions in user-generated videos, thus making the generated sentence a bit boring and soulless. To fill the research gap, this paper presents a novel emotional video captioning framework in which we design a Vision-based Emotion Interpretation Network to effectively capture the emotions conveyed in videos and describe the visual content in both factual and emotional languages. Specifically, we first model the emotion distribution over an open psychological vocabulary to predict the emotional state of videos. Then, guided by the discovered emotional state, we incorporate visual context, textual context, and visual-textual relevance into an aggregated multimodal contextual vector to enhance video captioning. Furthermore, we optimize the network in a new emotion-fact coordinated way that involves two losses— Emotional Indication Loss and Factual Contrastive Loss , which penalize the error of emotion prediction and visual-textual factual relevance, respectively. In other words, we innovatively introduce emotional representation learning into an end-to-end video captioning network. Extensive experiments on public benchmark datasets, EmVidCap and EmVidCap-S, demonstrate that our method can significantly outperform the state-of-the-art methods by a large margin. Quantitative ablation studies and qualitative analyses clearly show that our method is able to effectively capture the emotions in videos and thus generate emotional language sentences to interpret the video content.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
丘比特应助双马尾小男生采纳,获得10
1秒前
球球完成签到,获得积分10
2秒前
李木槿发布了新的文献求助10
3秒前
浩二应助xiaolu采纳,获得10
4秒前
4秒前
琪琪完成签到 ,获得积分10
5秒前
研友_Z6Qrbn发布了新的文献求助10
5秒前
FashionBoy应助有志男青年采纳,获得10
7秒前
7秒前
美丽无血完成签到,获得积分10
7秒前
10秒前
11秒前
CC完成签到 ,获得积分10
12秒前
买菜市民熊先生完成签到,获得积分10
12秒前
自然秋柳完成签到 ,获得积分10
12秒前
14秒前
23秒前
zho发布了新的文献求助10
24秒前
24秒前
25秒前
asir_xw完成签到,获得积分10
28秒前
yaoweiqi发布了新的文献求助10
29秒前
shangxinyu完成签到,获得积分10
29秒前
FBC完成签到,获得积分10
31秒前
小蘑菇应助yaoweiqi采纳,获得10
33秒前
34秒前
34秒前
客念完成签到 ,获得积分10
34秒前
38秒前
yaoweiqi完成签到,获得积分20
40秒前
李健的小迷弟应助李木槿采纳,获得10
40秒前
NUS完成签到,获得积分10
41秒前
今后应助qaw采纳,获得10
44秒前
鳗鱼海安发布了新的文献求助10
46秒前
hhhblabla应助可爱霖霖采纳,获得10
47秒前
47秒前
47秒前
123发布了新的文献求助10
48秒前
Ruilin完成签到 ,获得积分10
50秒前
高分求助中
LNG地下式貯槽指針(JGA指-107) 1000
LNG地上式貯槽指針 (JGA指 ; 108) 1000
Preparation and Characterization of Five Amino-Modified Hyper-Crosslinked Polymers and Performance Evaluation for Aged Transformer Oil Reclamation 700
Operative Techniques in Pediatric Orthopaedic Surgery 510
How Stories Change Us A Developmental Science of Stories from Fiction and Real Life 500
九经直音韵母研究 500
Full waveform acoustic data processing 500
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 免疫学 细胞生物学 电极
热门帖子
关注 科研通微信公众号,转发送积分 2929877
求助须知:如何正确求助?哪些是违规求助? 2581287
关于积分的说明 6961571
捐赠科研通 2230090
什么是DOI,文献DOI怎么找? 1184889
版权声明 589565
科研通“疑难数据库(出版商)”最低求助积分说明 579942