MultiSenseSeg: A cost-effective unified multimodal semantic segmentation model for remote sensing

计算机科学遥感分割图像分割人工智能计算机视觉地质学

作者

Qingpeng Wang,Wei Chen,Zhou Huang,Hongzhao Tang,Lan Yang

出处

期刊：IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：62: 1-24 被引量：4

标识

DOI：10.1109/tgrs.2024.3390750

摘要

Semantic segmentation is an essential technique in remote sensing. Until recently, most related research has focused primarily on advancing semantic segmentation models based on monomodal imagery, and less attention has been given to models that utilize multimodal remote sensing data. Moreover, most current multimodal approaches consider only limited bimodal situations and cannot simultaneously utilize three or more modalities. The increase in expensive computational costs associated with previous feature fusion paradigms hinders their application in broader cases. How to design a unified method to cover a wide variety of quantity-agnostic modalities for multimodal semantic segmentation remains unsolved issues. To address the aforementioned challenges, this study explores a feasible way and proposes a cost-effective multimodal sensing semantic segmentation model (MultiSenseSeg). MultiSenseSeg employs multiple lightweight modality-specific experts (MSEs), an adaptive multimodal matching (AMM) module, and a single feature extraction pipeline to efficiently model intra- and inter-modal relationships. Benefiting from these designs, the proposed MultiSenseSeg can serve as a unified multimodal model capable of addressing both monomodal and bimodal cases and readily extrapolating to scenarios with more modalities, thereby achieving semantic segmentation of arbitrary quantities of multimodal data. To evaluate the performance of our method, we select several state-of-the-art (SOTA) semantic segmentation models from the past three years and conduct extensive experiments on two public multimodal datasets. The results show that MultiSenseSeg can not only achieve higher accuracy but also exhibits user-friendly modality extrapolation, allowing end-to-end training for consumer-grade users based on limited hardware resources. The model's code will be available at https://github.com/W-qp/MultiSenseSeg.

求助该文献

最长约 10秒，即可获得该文献文件

MultiSenseSeg: A cost-effective unified multimodal semantic segmentation model for remote sensing

今日热心研友