计算机科学
隐藏字幕
水准点(测量)
领域(数学)
背景(考古学)
任务(项目管理)
人工智能
数据科学
钥匙(锁)
自然语言处理
人机交互
图像(数学)
系统工程
工程类
地图学
古生物学
生物
纯数学
地理
计算机安全
数学
作者
Laila Bashmal,Yakoub Bazi,Farid Melgani,Mohamad Mahmoud Al Rahhal,Mansour Zuair
出处
期刊:IEEE Geoscience and Remote Sensing Magazine
[Institute of Electrical and Electronics Engineers]
日期:2023-12-01
卷期号:11 (4): 63-93
被引量:1
标识
DOI:10.1109/mgrs.2023.3316438
摘要
The emerging field of vision–language models, which combines computer vision and natural language processing (NLP), has gained significant interest and exploration. This integration has opened up new research opportunities, particularly in remote sensing (RS), where it has the potential to enhance RS systems’ capabilities. In this context, this article presents a comprehensive review of more than 100 articles focusing on the integration of NLP techniques into RS understanding research. The review covers various vision–language modeling tasks, including but not limited to RS image captioning, RS text-to-image retrieval, RS visual question answering (VQA), and RS image generation. For each task, the review provides a summary of the state-of-the-art developments, including methods, evaluation metrics, datasets, and experimental results on benchmark datasets. The review is concluded by discussing the key challenges and highlighting potential research directions for future development, with the aim of inspiring further research in this important field.
科研通智能强力驱动
Strongly Powered by AbleSci AI