隐藏字幕
残余物
计算机科学
人工智能
背景(考古学)
编码器
特征(语言学)
编码(社会科学)
模式识别(心理学)
图像(数学)
强化学习
计算机视觉
数学
算法
古生物学
语言学
哲学
统计
生物
操作系统
作者
Zhenrong Deng,Yonglin Zhang,Rui Yang,Rushi Lan,Wenming Huang,Xiliang Luo
出处
期刊:Jisuanji fuzhu sheji yu tuxingxue xuebao
[China Science Publishing & Media Ltd.]
日期:2021-01-01
卷期号:33 (1): 49-58
标识
DOI:10.3724/sp.j.1089.2021.18262
摘要
To address the problem of insufficient detailed semantic information in current global features-based image captioning models, an image Chinese captioning model combining global and local features is proposed.The proposed model adopts the encoder-decoder framework.In the coding stage, the residual networks (Res-Net) and Faster R-CNN are used to extract the global and local features of images respectively, improving the model ҆ s utilization of image features at different scales.A bi-directional gated recurrent unit (BiGRU) with embedded visual attention structure and residual connection structure is applied as the decoder (BiGRU with residual connection and attention, BiGRU-RA).The model can adaptively allocate image features and text weights, and improve the mapping relationship between image feature regions and context information.Additionally, the reinforcement learning-based policy gradient is added to improve the loss function of the model and optimize the evaluation criteria CIDEr directly.The training and experiments are conducted on the Chinese captioning dataset of AI challenger.The comparative results show that the proposed model obtained better
科研通智能强力驱动
Strongly Powered by AbleSci AI