安全性令牌
计算机科学
情态动词
人工智能
变压器
计算机视觉
图像融合
保险丝(电气)
融合
模式识别(心理学)
图像(数学)
工程类
材料科学
电气工程
电压
语言学
哲学
计算机安全
高分子化学
作者
Jing Li,Bin Yang,Lu Bai,Hao Dou,Chang Li,Lingfei Ma
出处
期刊:IEEE Transactions on Instrumentation and Measurement
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:72: 1-14
被引量:9
标识
DOI:10.1109/tim.2023.3312755
摘要
The existing transformer-based infrared and visible image fusion methods mainly focus on the self-attention correlation existing in the intra-modal of each image, yet they neglect the discrepancies of inter-modal in the same position of two source images, because the information of infrared token and visible token in the same position is unbalanced. Therefore, we develop a pure transformer fusion model to reconstruct fused image in token dimension, which not only perceives the long-range dependencies in intra-modal by self-attention mechanism of the transformer, but also captures the attentive correlation of inter-modal in token space. Moreover, to enhance and balance the interaction of inter-modal tokens when we fuse the corresponding infrared and visible tokens, learnable attentive weights are applied to dynamically measure the correlation of inter-modal tokens in the same position. Concretely, infrared and visible tokens are first calculated by two independent transformers to extract long-range dependencies in intra-modal due to their modal difference. Then, we fuse the corresponding infrared and visible tokens of inter-modal in token space to reconstruct the fused image. In addition, to comprehensively extract multi-scale long-range dependencies and capture attentive correlation of corresponding multi-modal tokens in different token sizes, we explore and extend the fusion to multi-grained token-based fusion. Ablation studies and extensive experiments illustrate the effectiveness and superiorities of our model when compared with nine state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI