隐藏字幕
可解释性
判别式
计算机科学
人工智能
特征(语言学)
判决
相似性(几何)
词(群论)
模态(人机交互)
自然语言处理
模式识别(心理学)
图像(数学)
计算机视觉
数学
语言学
哲学
几何学
作者
Qi Wang,Wei Huang,Xueting Zhang,Xuelong Li
出处
期刊:IEEE transactions on cybernetics
[Institute of Electrical and Electronics Engineers]
日期:2023-11-01
卷期号:53 (11): 6910-6922
被引量:3
标识
DOI:10.1109/tcyb.2022.3222606
摘要
Remote sensing image captioning (RSIC), which describes a remote sensing image with a semantically related sentence, has been a cross-modal challenge between computer vision and natural language processing. For visual features extracted from remote sensing images, global features provide the complete and comprehensive visual relevance of all the words of a sentence simultaneously, while local features can emphasize the discrimination of these words individually. Therefore, not only global features are important for caption generation but also local features are meaningful for making the words more discriminative. In order to make full use of the advantages of both global and local features, in this article, we propose an attention-based global-local captioning model (GLCM) to obtain global-local visual feature representation for RSIC. Based on the proposed GLCM, the correlation of all the generated words and the relation of each separate word and the most related local visual features can be visualized in a similarity-based manner, which provides more interpretability for RSIC. In the extensive experiments, our method achieves comparable results in UCM-captions and superior results in Sydney-captions and RSICD which is the largest RSIC dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI