计算机科学
人工智能
残余物
变压器
计算机视觉
模式识别(心理学)
红外线的
工程类
算法
电压
光学
电气工程
物理
作者
Prodip Kumar Sarker,Qingjie Zhao
标识
DOI:10.1016/j.patcog.2024.110288
摘要
Visible-infrared (VI) person re-identification (Re-ID) is a critical identification task that involves retrieving and matching images of an individual using both infrared and visible imaging modalities. To improve the performance, researchers have developed methods to obtain implicit feature information; however, this degrades with fewer discriminative features. To address this issue, we propose a weighted fused cross-attention multi-scale residual vision transformer (WF-CAMReViT) approach to re-identify the appropriate person from visible-infrared modality images by integrating the cross-attention multi-scale residual vision transformer architecture with Opposition-based Dove Swarm Optimization (ODSO). The proposed framework aims to bridge the domain gap between the visible and infrared modalities and significantly improve the re-identification performance. RGB (visible) and infrared (IR) images of persons are gathered from standard datasets, subjected to a cross-attention multi-scale residual vision transformer network to collect features, and then fuse using minimal weight. We also propose Opposition-based DSO to find the minimal weight. The weighted fused features are then subjected to the final decoder layer of CAMReViT to perceive the characteristics of each modality. In this study, model-aware enhancement (MAE) loss is develop to improve the modality information capacity of modality-shared features. Then, the experimental results on the SYSU-MM01 and RegDB datasets are compared with state-of-the-art transformer-based visible-infrared person Re-ID tasks to verify the efficacy of the proposed model.
科研通智能强力驱动
Strongly Powered by AbleSci AI