计算机科学
编码器
人工智能
融合
图像融合
模式识别(心理学)
互补性(分子生物学)
融合机制
融合规则
模态(人机交互)
图像(数学)
特征(语言学)
计算机视觉
模式
哲学
社会学
操作系统
脂质双层融合
生物
遗传学
语言学
社会科学
标识
DOI:10.1016/j.inffus.2023.102147
摘要
Multimodal visual information fusion aims to integrate the multi-sensor data into a single image which contains more complementary information and less redundant features. However the complementary information is hard to extract, especially for infrared and visible images which contain big similarity gap between these two modalities. The common cross attention modules only consider the correlation, on the contrary, image fusion tasks need focus on complementarity (uncorrelation). Hence, in this paper, a novel cross attention mechanism (CAM) is proposed to enhance the complementary information. Furthermore, a two-stage training strategy based fusion scheme is presented to generate the fused images. For the first stage, two auto-encoder networks with same architecture are trained for each modality. Then, with the fixed encoders, the CAM and a decoder are trained in the second stage. With the trained CAM, features extracted from two modalities are integrated into one fused feature in which the complementary information is enhanced and the redundant features are reduced. Finally, the fused image can be generated by the trained decoder. The experimental results illustrate that our proposed fusion method obtains the SOTA fusion performance compared with the existing fusion networks. The codes of our fusion method will be available soon.
科研通智能强力驱动
Strongly Powered by AbleSci AI