隐藏字幕
水下
计算机科学
特征(语言学)
人工智能
图像融合
计算机视觉
图像(数学)
地质学
语言学
海洋学
哲学
作者
李 丽 Li Li,Yanan Wei,Peng Ren
标识
DOI:10.1145/3647649.3647700
摘要
Image captioning employs artificial intelligence to translate visual content into natural language text descriptions. Underwater image captioning offers specialized interpretation for scenarios such as underwater environmental monitoring, underwater archaeology, and offshore platforms. It proves effective in compressing information for the real-time transmission of extensive underwater images via underwater acoustic communication. In this article, we annotate underwater image caption dataset for this task, and create a baseline using the encoder-decoder neural image caption model. It output complete sentences related to image content. The description of underwater images mainly focuses on the underwater scene and objects. The object detection model based on the Faster RCNN is applied to extract the full-image features and regional features corresponding to the target in the image. For the caption model, we enhanced the input features of the language generator by combining global information, regional details, contextual cues, and pre-ordered text information through feature fusion. It enables the generator to output precise semantic expressions related to salient objects. The method was applied to the annotated underwater image caption dataset, resulting in more accurate descriptions of underwater targets compared to sentences generated by a basic neural network model. The evaluation metrics reflected higher scores, affirming the effectiveness of our approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI