计算机科学
模态(人机交互)
人工智能
变压器
模式
模式识别(心理学)
判别式
特征学习
相互信息
计算机视觉
社会科学
量子力学
物理
社会学
电压
作者
Yujian Feng,Jian Yu,Feng Chen,Yimu Ji,Fei Wu,Shangdong Liu,Xiao‐Yuan Jing
标识
DOI:10.1109/tmm.2022.3224663
摘要
Visible-infrared person re-identification (VI Re-ID) is designed to match person images of the same identity from visible and infrared cameras. Transformer structures have been successfully applied in the field of VI Re-ID. However, previous Transformer-based methods were mainly designed to capture global content information in a single modality, and could not simultaneously perceive semantic information between two modalities from a global perspective. To solve this problem, we propose a novel framework named the cross-modality interaction Transformer (CMIT). It has strong abilities in modeling spatial and sequential features that can capture dependencies between long-range features, and explicitly improves the discriminativeness of features by exchanging information across modalities, thus contributing to obtaining modality-invariant representations. Specifically, CMIT utilizes a cross-modality attention mechanism to enrich the feature representations of each patch token by interacting with the patch tokens of the other modality, and aggregates local features of the CNN structure and global information of the Transformer structure to mine feature saliency representation. Furthermore, the modality-discriminative (MD) loss function is proposed to learn potential consistency between modalities to encourage intra-modality compactness within class and inter-modality separation between classes. Extensive experiments on two benchmarks demonstrate that our approach outperforms state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI