多光谱图像
计算机科学
人工智能
利用
目标检测
模态(人机交互)
变压器
模式识别(心理学)
计算机视觉
特征提取
工程类
计算机安全
电气工程
电压
作者
Qingyun Fang,Han Da-peng,Zhaokui Wang
出处
期刊:Cornell University - arXiv
日期:2021-01-01
被引量:13
标识
DOI:10.48550/arxiv.2111.00273
摘要
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust in the open world. To fully exploit the different modalities, we present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper. Unlike prior CNNs-based works, guided by the transformer scheme, our network learns long-range dependencies and integrates global contextual information in the feature extraction stage. More importantly, by leveraging the self attention of the transformer, the network can naturally carry out simultaneous intra-modality and inter-modality fusion, and robustly capture the latent interactions between RGB and Thermal domains, thereby significantly improving the performance of multispectral object detection. Extensive experiments and ablation studies on multiple datasets demonstrate that our approach is effective and achieves state-of-the-art detection performance. Our code and models are available at https://github.com/DocF/multispectral-object-detection.
科研通智能强力驱动
Strongly Powered by AbleSci AI