已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning

隐藏字幕 计算机科学 遥感 变压器 计算机视觉 人工智能 图像(数学) 计算机图形学(图像) 地质学 工程类 电气工程 电压
作者
Lingwu Meng,Jing Wang,Ran Meng,Yang Yang,Liang Xiao
出处
期刊:IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers]
卷期号:62: 1-15 被引量:2
标识
DOI:10.1109/tgrs.2024.3385500
摘要

Recent progress has shown that integrating multiscale visual features with advanced Transformer architectures is a promising approach for remote sensing image captioning (RSIC). However, the lack of local modeling ability in self-attention may potentially lead to inaccurate contextual information. Moreover, the scarcity of trainable image-caption pairs poses challenges in effectively harnessing the semantic alignment between images and texts. To mitigate these issues, we propose a Multiscale Grouping Transformer with Contrastive Language-Image Pre-training (CLIP) latents (MG-Transformer) for RSIC. First of all, a CLIP image embedding and a set of region features are extracted within a Multi-level Feature Extraction module. To achieve a comprehensive image representation, a Semantic Correlation module is designed to integrate the image embedding and region features with an attention gate. Subsequently, the integrated image features are fed into a Transformer model. The Transformer encoder utilizes dilated convolutions with different dilation rates to obtain multiscale visual features. To enhance the local modeling ability of the self-attention mechanism in the encoder, we introduce a Global Grouping Attention mechanism. This mechanism incorporates a grouping operation into self-attention, allowing each attention head to focus on different contextual information. The Transformer decoder then adopts the Meshed Cross-Attention mechanism to establish relationships between various scales of visual features and text features. This facilitates the generation of captions for images by the decoder. Experimental results on three RSIC datasets demonstrate the superiority of the proposed MG-Transformer. The code will be publicly available at https://github.com/One-paper-luck/MG-Transformer.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Young完成签到 ,获得积分10
2秒前
春春发布了新的文献求助10
2秒前
2秒前
naomi发布了新的文献求助30
7秒前
共享精神应助靓丽的采白采纳,获得10
7秒前
8秒前
rachel-yue发布了新的文献求助10
11秒前
12秒前
12秒前
14秒前
xzy998应助鸿儒采纳,获得10
15秒前
尉迟发布了新的文献求助10
17秒前
王倩发布了新的文献求助10
17秒前
地平发布了新的文献求助20
18秒前
止山完成签到,获得积分10
18秒前
20秒前
止山发布了新的文献求助10
20秒前
完美世界应助Maximum采纳,获得10
23秒前
29秒前
shrak完成签到 ,获得积分10
33秒前
35秒前
乐乐应助春春采纳,获得10
39秒前
41秒前
烟花应助wnag采纳,获得10
42秒前
luckkit完成签到,获得积分10
45秒前
pluto应助鸿儒采纳,获得50
45秒前
46秒前
小二郎应助尉迟采纳,获得10
48秒前
积极的奇异果完成签到 ,获得积分10
49秒前
50秒前
51秒前
jinchen发布了新的文献求助10
51秒前
张包子完成签到 ,获得积分10
52秒前
我主沉浮发布了新的文献求助10
53秒前
wnag发布了新的文献求助10
55秒前
专注翠梅发布了新的文献求助10
56秒前
球球发布了新的文献求助10
56秒前
光撒盐完成签到,获得积分10
58秒前
星辰大海应助jinchen采纳,获得50
1分钟前
春日无尾熊完成签到 ,获得积分10
1分钟前
高分求助中
Licensing Deals in Pharmaceuticals 2019-2024 3000
Cognitive Paradigms in Knowledge Organisation 2000
Introduction to Spectroscopic Ellipsometry of Thin Film Materials Instrumentation, Data Analysis, and Applications 1800
Natural History of Mantodea 螳螂的自然史 1000
A Photographic Guide to Mantis of China 常见螳螂野外识别手册 800
How Maoism Was Made: Reconstructing China, 1949-1965 800
Barge Mooring (Oilfield Seamanship Series Volume 6) 600
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3314210
求助须知:如何正确求助?哪些是违规求助? 2946566
关于积分的说明 8530692
捐赠科研通 2622261
什么是DOI,文献DOI怎么找? 1434442
科研通“疑难数据库(出版商)”最低求助积分说明 665307
邀请新用户注册赠送积分活动 650838