计算机科学
鉴定(生物学)
人工智能
水准点(测量)
情态动词
图像(数学)
集合(抽象数据类型)
秩(图论)
判决
比例(比率)
计算机视觉
模式识别(心理学)
机器学习
物理
组合数学
生物
量子力学
化学
植物
高分子化学
程序设计语言
地理
数学
大地测量学
作者
Leqi Ding,Lei Liu,Yan Huang,Chenglong Li,Cheng Zhang,Sheng Wang,Liang Wang
出处
期刊:IEEE Transactions on Intelligent Transportation Systems
[Institute of Electrical and Electronics Engineers]
日期:2024-01-16
卷期号:25 (7): 7673-7686
被引量:3
标识
DOI:10.1109/tits.2023.3348599
摘要
Vehicle Re-IDentification (Re-ID) aims to retrieve the most similar images with a given query vehicle image from a set of images captured by non-overlapping cameras, and plays a crucial role in intelligent transportation systems and has made impressive advancements in recent years. In real-world scenarios, we can often acquire the text descriptions of target vehicle through witness accounts, and then manually search the image queries for vehicle Re-ID, which is time-consuming and labor-intensive. To solve this problem, this paper introduces a new fine-grained cross-modal retrieval task called text-to-image vehicle re-identification, which seeks to retrieve target vehicle images based on the given text descriptions. To bridge the significant gap between language and visual modalities, we propose a novel Multi-scale multi-view Cross-modal Alignment Network (MCANet). In particular, we incorporate view masks and multi-scale features to align image and text features in a progressive way. In addition, we design the Masked Bidirectional InfoNCE (MB-InfoNCE) loss to enhance the training stability and make the best use of negative samples. To provide an evaluation platform for text-to-image vehicle re-identification, we create a Text-to-Image Vehicle Re-Identification dataset (T2I VeRi), which contains 2465 image-text pairs from 776 vehicles with an average sentence length of 26.8 words. Extensive experiments conducted on T2I VeRi demonstrate MCANet outperforms the current state-of-art (SOTA) method by 2.2% in rank-1 accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI