隐藏字幕
计算机科学
特征提取
人工智能
图像(数学)
过程(计算)
解码方法
特征(语言学)
编码器
表达式(计算机科学)
钥匙(锁)
模式识别(心理学)
算法
操作系统
哲学
语言学
计算机安全
程序设计语言
标识
DOI:10.1109/prai59366.2023.10332008
摘要
In recent years, with the development of technologies such as deep learning and attention mechanism, image captions have made great progress. In the traditional image caption model, there are problems of insufficient feature extraction and inaccurate information expression in the decoding process. In view of the above problems, this paper builds a model based on the framework of encoder-decoder, proposes an improvement based on ResNest network architecture in the encoder, and adds Squeeze-and-Excitation module to obtain image feature information. An improved two-layer long short-term memory (LSTM) image caption generation model is proposed on the decoder. Through more efficient multi-head attention, the model can more accurately understand the relationship between features, and generate more accurate and specific text description statements based on complete semantic information. In this paper, experiments are carried out on Flickr8k and Flickr30k datasets. Through the comparative analysis of the experimental results of the evaluation indicators, it is proved that the proposed model can effectively realize image caption and improve the accuracy of generating text description statements.
科研通智能强力驱动
Strongly Powered by AbleSci AI