隐藏字幕
计算机科学
人工智能
卷积神经网络
自然语言处理
机器翻译
任务(项目管理)
深度学习
构造(python库)
自然语言
公制(单位)
任务分析
钥匙(锁)
图像(数学)
领域(数学)
语言模型
运营管理
计算机安全
管理
数学
纯数学
经济
程序设计语言
作者
Arunkumar Gopu,Pratyush Nishchal,Vishesh Mittal,Kuna Srinidhi
标识
DOI:10.1109/inc457730.2023.10263093
摘要
The automatic generation of image descriptions is leading the field of computer vision and natural language processing-based research. Image captioning is a key task that calls for a semantic understanding of the images and the capacity to create descriptions with right structure. Image captioning is a complex problem as it often demands accessing data that might not be visible in each scene. It will require logical thinking to evaluate or have in-depth knowledge about the object present in an image. In this study, we developed a multilayer Convolutional Neural Network to produce words that describe the images, and we used Long Short-Term Memory to accurately construct relevant sentences out of the words that are produced. To generate an accurate description, the Convolutional Neural Network (CNN) model first compares the targeted image against a huge dataset of training samples. In this study, we have used the Flickr 8k dataset. We have used the Bilingual Evaluation Understudy (BLEU) metric to determine how well our model is generating captions for the images. It evaluates the generated text that has been translated from one language to a different language to evaluate the effectiveness of the machine translation system. In this study, we have also used two pre-trained models (VGG16, and XceptionV3) for comparative study.
科研通智能强力驱动
Strongly Powered by AbleSci AI