情态动词
计算机科学
图像分割
变压器
计算机视觉
分割
人工智能
工程类
电压
电气工程
材料科学
高分子化学
作者
Yuheng Liu,Ye Wang,Yifan Zhang,Shaohui Mei
标识
DOI:10.1109/whispers61460.2023.10431036
摘要
Multi-modal semantic segmentation of remote sensing (RS) images is a challenging task due to the complex relationship between different modalities and the large intra-class variance of objects in RS images. Existing semantic segmentation methods can only utilize the information of a single modality, which is not sufficient to obtain accurate segmentation results. To address this problem, in this paper, a novel multimodal global-local transformer segmentor (MMGLOTS) is proposed to cope with the multi-modal semantic segmentation task. Specifically, the semantic features of each modality are extracted by the multi-modal semantic feature extractor (MMSFE) with an adaptive fusion strategy. Then, the features are aggregated, and deep representations of both local and global dependencies are obtained by the global-local transformer (GLT). The final prediction is obtained by progressively restoring the deep representations with a prediction restorer (PR). Extensive experiments on two multi-modal semantic segmentation datasets show that our method achieves superior performance and the proposed method achieves the first place on the newly held Cross-City Multi-modal Semantic Segmentation Challenge 2023.
科研通智能强力驱动
Strongly Powered by AbleSci AI