计算机科学
图像融合
人工智能
变压器
计算机视觉
模式识别(心理学)
图像(数学)
电压
物理
量子力学
作者
Jun Zhang,Licheng Jiao,Wenping Ma,Fang Liu,Xu Liu,Lingling Li,Puhua Chen,Shuyuan Yang
标识
DOI:10.1109/tmm.2023.3243659
摘要
Multimodal Image fusion is becoming urgent in multi-sensor information utilization. However, existing end-to-end image fusion frameworks ignore a priori knowledge integration and long-distance dependencies across domains, which brings challenges to the network convergence and global image perception in complex scenes. In this paper, a conditional generative adversarial network with transformer (TCGAN) is proposed for multimodal image fusion. The generator is to generate a fused image with the source images content. The discriminators are adopted to distinguish the differences between the fused image and the source images. Adversarial training makes the final fused image to maintain the structural and textural details in the cross-modal images simultaneously. In particular, a wavelet fusion module makes the inputs contain image content from different domains as much as possible. The extracted convolutional features interact in the multiscale cross-modal transformer fusion module to fully complement the associated information. It makes the generator to focus on both local and global context. TCGAN fully considers the training efficiency of the adversarial process and the integrated retention of redundant information. Various experimental results of TCGAN have highlighted targets, rich details, and fast convergence properties on public datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI