情态动词
融合
分割
人工智能
计算机科学
计算机视觉
材料科学
语言学
哲学
复合材料
作者
Xuming Zhang,Xingfa Gu,Qingjiu Tian,Lorenzo Bruzzone
出处
期刊:Cornell University - arXiv
日期:2024-11-13
标识
DOI:10.48550/arxiv.2411.09023
摘要
Improving hyperspectral image (HSI) semantic segmentation by exploiting complementary information from a supplementary data type (referred to X-modality) is promising but challenging due to differences in imaging sensors, image content, and resolution. Current techniques struggle to enhance modality-specific and modality-shared information, as well as to capture dynamic interaction and fusion between different modalities. In response, this study proposes CoMiX, an asymmetric encoder-decoder architecture with deformable convolutions (DCNs) for HSI-X semantic segmentation. CoMiX is designed to extract, calibrate, and fuse information from HSI and X data. Its pipeline includes an encoder with two parallel and interacting backbones and a lightweight all-multilayer perceptron (ALL-MLP) decoder. The encoder consists of four stages, each incorporating 2D DCN blocks for the X model to accommodate geometric variations and 3D DCN blocks for HSIs to adaptively aggregate spatial-spectral features. Additionally, each stage includes a Cross-Modality Feature enhancement and eXchange (CMFeX) module and a feature fusion module (FFM). CMFeX is designed to exploit spatial-spectral correlations from different modalities to recalibrate and enhance modality-specific and modality-shared features while adaptively exchanging complementary information between them. Outputs from CMFeX are fed into the FFM for fusion and passed to the next stage for further information learning. Finally, the outputs from each FFM are integrated by the ALL-MLP decoder for final prediction. Extensive experiments demonstrate that our CoMiX achieves superior performance and generalizes well to various multimodal recognition tasks. The CoMiX code will be released.
科研通智能强力驱动
Strongly Powered by AbleSci AI