隐藏字幕
强化学习
计算机科学
人工智能
卷积神经网络
任务(项目管理)
图像(数学)
人工神经网络
光学(聚焦)
对象(语法)
机器学习
光学
物理
经济
管理
作者
Zhibo Zhou,Xiaoming Zhang,Zhoujun Li,Feiran Huang,Jie Xu
出处
期刊:Big data
[Mary Ann Liebert]
日期:2022-12-01
卷期号:10 (6): 481-492
被引量:4
标识
DOI:10.1089/big.2021.0049
摘要
The analysis of large-scale multimodal data has become very popular recently. Image captioning, whose goal is to describe the content of image with natural language automatically, is an essential and challenging task in artificial intelligence. Commonly, most existing image caption methods utilize the mixture of Convolutional Neural Network and Recurrent Neural Network framework. These methods either pay attention to global representation at the image level or only focus on the specific concepts, such as regions and objects. To make the most of characteristics about a given image, in this study, we present a novel model named Multilevel Attention Networks and Policy Reinforcement Learning for image caption generation. Specifically, our model is composed of a multilevel attention network module and a policy reinforcement learning module. In the multilevel attention network, the object-attention network aims to capture global and local details about objects, whereas the region-attention network obtains global and local features about regions. After that, a policy reinforcement learning algorithm is adopted to overcome the exposure bias problem in the training phase and solve the loss-evaluation mismatching problem at the caption generation stage. With the attention network and policy algorithm, our model can automatically generate accurate and natural sentences for any particular image. We carry out extensive experiments on the MSCOCO and Flickr30k data sets, demonstrating that our model is superior to other competitive methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI