Probing Synergistic High-Order Interaction for Multi-modal Image Fusion

人工智能计算机科学图像融合计算机视觉情态动词融合图像（数学）模式识别（心理学）语言学哲学高分子化学化学

作者

Man Zhou,Naishan Zheng,Xuanhua He,Danfeng Hong,Jocelyn Chanussot

出处

期刊：IEEE Transactions on Pattern Analysis and Machine Intelligence [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：: 1-18

链接

nih.govdoi.org

标识

DOI：10.1109/tpami.2024.3475485

摘要

Multi-modal image fusion aims to generate a fused image by integrating and distinguishing the cross-modality complementary information from multiple source images. While the cross-attention mechanism with global spatial interactions appears promising, it only captures second-order spatial interactions, neglecting higher-order interactions in both spatial and channel dimensions. This limitation hampers the exploitation of synergies between multi-modalities. To bridge this gap, we introduce a Synergistic High-order Interaction Paradigm (SHIP), designed to systematically investigate spatial fine-grained and global statistics collaborations between the multi-modal images across two fundamental dimensions: 1) Spatial dimension: we construct spatial fine-grained interactions through element-wise multiplication, mathematically equivalent to global interactions, and then foster high-order formats by iteratively aggregating and evolving complementary information, enhancing both efficiency and flexibility. 2) Channel dimension: expanding on channel interactions with first-order statistics (mean), we devise high-order channel interactions to facilitate the discernment of inter-dependencies between source images based on global statistics. We further introduce an enhanced version of the SHIP model, called SHIP++ that enhances the cross-modality information interaction representation by the cross-order attention evolving mechanism, cross-order information integration, and residual information memorizing mechanism. Harnessing high-order interactions significantly enhances our model's ability to exploit multi-modal synergies, leading in superior performance over state-of-the-art alternatives, as shown through comprehensive experiments across various benchmarks in two significant multi-modal image fusion tasks: pan-sharpening, and infrared and visible image fusion. The source code is publicly available at https://github.com/manman1995/HOIF.

求助该文献

Probing Synergistic High-Order Interaction for Multi-modal Image Fusion

今日热心研友