Coherent Visual Storytelling via Parallel Top-Down Visual and Topic Attention

计算机科学 段落 隐藏字幕 短语 发电机(电路理论) 人工智能 视觉搜索 判决 讲故事 自然语言处理 情报检索 叙述的 图像(数学) 语言学 量子力学 物理 万维网 哲学 功率(物理)
作者
Jinjing Gu,Hanli Wang,Ruichao Fan
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
卷期号:33 (1): 257-268 被引量:5
标识
DOI:10.1109/tcsvt.2022.3199603
摘要

Visual storytelling aims at producing a narrative paragraph for a given photo album automatically. It introduces more new challenges than individual image paragraph descriptions, mainly due to the difficulty in preserving coherent topics and in generating diverse phrases to depict the rich content of a photo album. Existing attention-based models that lack higher-level guiding information always result in a deviation between the generated sentence and the topic expressed by the image. In addition, these widely applied language generation approaches employing standard beam search tend to produce monotonous descriptions. In this work, a coherent visual storytelling (CoVS) framework is designed to address the above-mentioned problems. Specifically, in the encoding phase, an image sequence encoder is designed to efficiently extract visual features of the input photo album. Then, the novel parallel top-down visual and topic attention (PTDVTA) decoder is constructed via a topic-aware neural network, a parallel top-down attention model, and a coherent language generator. Concretely, visual attention focuses on the attributes and the relationships of the objects, while topic attention integrating a topic-aware neural network could improve the coherence of generated sentences. Eventually, a phrase beam search algorithm with $n$ -gram hamming diversity is further designed to optimize the expression diversity of the generated story. To justify the proposed CoVS framework, extensive experiments are conducted on the VIST dataset, which shows that CoVS can automatically generate coherent and diverse stories in a more natural way. Moreover, CoVS obtains better performance than state-of-the-art baselines on BLEU-4 and METEOR scores, while maintaining good CIDEr and ROUGH_L scores. The source code of this work can be found in https://mic.tongji.edu.cn .

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
gefan完成签到 ,获得积分10
1秒前
刻苦小笼包完成签到,获得积分10
2秒前
pp完成签到 ,获得积分10
2秒前
兮颜完成签到 ,获得积分10
3秒前
EL关闭了EL文献求助
7秒前
完美世界应助包容小鸽子采纳,获得10
7秒前
年轻的钢笔完成签到 ,获得积分10
13秒前
14秒前
zhm完成签到,获得积分20
17秒前
qiao发布了新的文献求助10
19秒前
火星上的一斩完成签到 ,获得积分10
21秒前
结实的德地完成签到,获得积分10
25秒前
传奇3应助小陈采纳,获得10
30秒前
34秒前
QiWei完成签到 ,获得积分10
36秒前
39秒前
zyx发布了新的文献求助10
41秒前
动物园小科畜完成签到,获得积分10
45秒前
苹果酸奶完成签到,获得积分10
47秒前
charint举报顾晟杰求助涉嫌违规
49秒前
zhanglh完成签到 ,获得积分10
53秒前
科研通AI6.2应助铮铮铁骨采纳,获得10
54秒前
飘萍过客完成签到,获得积分10
57秒前
斯文败类应助cc采纳,获得10
57秒前
TrishX完成签到 ,获得积分10
58秒前
LSY完成签到 ,获得积分10
1分钟前
悲凉的便当完成签到,获得积分10
1分钟前
1分钟前
包容小鸽子完成签到,获得积分10
1分钟前
adam完成签到,获得积分0
1分钟前
cc发布了新的文献求助10
1分钟前
1分钟前
1分钟前
1分钟前
1分钟前
1分钟前
1分钟前
1分钟前
1分钟前
1分钟前
高分求助中
Yangtze Reminiscences. Some Notes And Recollections Of Service With The China Navigation Company Ltd., 1925-1939 800
Common Foundations of American and East Asian Modernisation: From Alexander Hamilton to Junichero Koizumi 600
Signals, Systems, and Signal Processing 510
Discrete-Time Signals and Systems 510
T/SNFSOC 0002—2025 独居石精矿碱法冶炼工艺技术标准 300
The Impact of Lease Accounting Standards on Lending and Investment Decisions 250
The Linearization Handbook for MILP Optimization: Modeling Tricks and Patterns for Practitioners (MILP Optimization Handbooks) 200
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5852066
求助须知:如何正确求助?哪些是违规求助? 6275741
关于积分的说明 15627645
捐赠科研通 4967992
什么是DOI,文献DOI怎么找? 2678855
邀请新用户注册赠送积分活动 1623112
关于科研通互助平台的介绍 1579503