计算机科学
红外线的
卷积(计算机科学)
融合
变压器
人工智能
计算机视觉
比例(比率)
光学
电压
物理
人工神经网络
语言学
哲学
量子力学
作者
Haojie Tang,Yao Qian,Mengliang Xing,Yisheng Cao,Gang Liu
标识
DOI:10.1016/j.optlaseng.2024.108094
摘要
The image fusion community is thriving with the wave of deep learning, and the most popular fusion methods are usually built upon well-designed network structures. However, most of the current methods do not fully exploit deeper features while ignore the importance of long-range dependencies. In this paper, a convolution and vision Transformer-based multi-scale parallel cross fusion network for infrared and visible images is proposed (MPCFusion). To exploit deeper texture details, a feature extraction module based on convolution and vision Transformer is designed. With a view to correlating the shallow features between different modalities, a parallel cross-attention module is proposed, in which a parallel-channel model efficiently preserves the proprietary modal features, followed by a cross-spatial model that ensures the information interactions between the different modalities. Moreover, a cross-domain attention module based on convolution and vision Transformer is proposed to capturing long-range dependencies between in-depth features and effectively solves the problem of global context loss. Finally, a nest-connection based decoder is used for implementing feature reconstruction. In particular, we design a new texture-guided structural similarity loss function to drive the network to preserve more complete texture details. Extensive experimental results illustrate that MPCFusion shows excellent fusion performance and generalization capabilities. The source code will be released at https://github.com/YQ-097/MPCFusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI