鉴定(生物学)
计算机科学
人工智能
红外线的
模式识别(心理学)
无监督学习
机器学习
光学
物理
植物
生物
作者
Ancong Wu,Chengzhi Lin,Wei‐Shi Zheng
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-1
标识
DOI:10.1109/tcsvt.2024.3404786
摘要
Visible-infrared person re-identification (Re-ID) plays a crucial role in matching people across camera views in the darkness and normal lighting. To reduce annotation cost, it is advantageous to learn Re-ID model from unlabeled visible-infrared image pairs. However, large modality gap makes it difficult to discover the underlying cross-modality sample relations. Compared with cross-modality sample pairs in the target domain, it is easier to obtain more single-modality visible image samples from other domains. In this work, we study unsupervised transfer learning to extract modality-shared knowledge from auxiliary unlabeled visible images in a source domain and leverage this knowledge to learn cross-modality matching in the unlabeled target domain. Our framework consists of two stages: RGB-gray asymmetric mutual learning and unsupervised cross-modality self-training. In the first stage, to extract visible-infrared shared information from auxiliary unlabeled visible images, we regard RGB images and grayscale fake infrared images transformed from RGB images as two views to learn view-shared information and simultaneously preserve RGB-specific information. Based on information theoretic analysis, we learn an RGB-gray feature extractor and further introduce an auxiliary gray feature extractor to quantify RGB-gray shared knowledge. This knowledge is then transferred to the RGB-gray feature extractor without eliminating RGB-specific information. We call this process Cross-Modality Asymmetric Mutual Learning (CMAM). In the second stage, for unsupervised cross-modality self-training in the target domain, we fuse the complementary knowledge in two models by mutual learning and employ bipartite cross-modality pseudo labeling to alleviate modality gap. For a more extensive evaluation, we collected a new public multi-modality dataset, SYSU-MM02, constructed from untrimmed videos. Our method achieves the state-of-the-art performance on three benchmark datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI