计算机科学
遥感
分割
图像分割
人工智能
计算机视觉
地质学
作者
Qingpeng Wang,Wei Chen,Zhou Huang,Hongzhao Tang,Lan Yang
出处
期刊:IEEE Transactions on Geoscience and Remote Sensing
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:62: 1-24
被引量:4
标识
DOI:10.1109/tgrs.2024.3390750
摘要
Semantic segmentation is an essential technique in remote sensing. Until recently, most related research has focused primarily on advancing semantic segmentation models based on monomodal imagery, and less attention has been given to models that utilize multimodal remote sensing data. Moreover, most current multimodal approaches consider only limited bimodal situations and cannot simultaneously utilize three or more modalities. The increase in expensive computational costs associated with previous feature fusion paradigms hinders their application in broader cases. How to design a unified method to cover a wide variety of quantity-agnostic modalities for multimodal semantic segmentation remains unsolved issues. To address the aforementioned challenges, this study explores a feasible way and proposes a cost-effective multimodal sensing semantic segmentation model (MultiSenseSeg). MultiSenseSeg employs multiple lightweight modality-specific experts (MSEs), an adaptive multimodal matching (AMM) module, and a single feature extraction pipeline to efficiently model intra- and inter-modal relationships. Benefiting from these designs, the proposed MultiSenseSeg can serve as a unified multimodal model capable of addressing both monomodal and bimodal cases and readily extrapolating to scenarios with more modalities, thereby achieving semantic segmentation of arbitrary quantities of multimodal data. To evaluate the performance of our method, we select several state-of-the-art (SOTA) semantic segmentation models from the past three years and conduct extensive experiments on two public multimodal datasets. The results show that MultiSenseSeg can not only achieve higher accuracy but also exhibits user-friendly modality extrapolation, allowing end-to-end training for consumer-grade users based on limited hardware resources. The model's code will be available at https://github.com/W-qp/MultiSenseSeg.
科研通智能强力驱动
Strongly Powered by AbleSci AI