计算机科学
融合
人工智能
变压器
计算机视觉
电气工程
工程类
哲学
电压
语言学
作者
Jiayi Ma,Linfeng Tang,Fan Fan,Jun Huang,Xiaoguang Mei,Yong Ma
出处
期刊:IEEE/CAA Journal of Automatica Sinica
[Institute of Electrical and Electronics Engineers]
日期:2022-06-30
卷期号:9 (7): 1200-1217
被引量:438
标识
DOI:10.1109/jas.2022.105686
摘要
This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI