计算机科学
粒度
关系(数据库)
Web搜索查询
情报检索
查询扩展
查询优化
人工智能
数据挖掘
搜索引擎
操作系统
作者
Xiaoyong Wang,Jianxi Yang
标识
DOI:10.1117/1.jei.34.1.013030
摘要
Text-based person search (TBPS) focuses on matching specific person images with the provided natural language queries. It plays a significant role in intelligent monitoring systems due to its open and convenient query format. The core of TBPS is to extract multi-modal features utilizing different encoders to elaborately capture the latent relationships between image parts and relevant words. However, existing methods typically utilize independent encoders to learn features from different modalities and explore global similarities, which leads to the introduction of noise from similar textual expressions. These weak positive samples (similar textual sentences) require more refined relation inference for cross-modal retrieval tasks. In addition, previous works have focused on matching different regions of person images, which under-utilizes the more discriminative conditional information in language queries. To address these problems, we propose a multi-granularity relation-aware and conditional queries learning network for TBPS, which extracts relation-aware features with conditional language queries in a multi-granularity manner. Specifically, the conditional query mechanism is designed to sufficiently utilize the conditional information in language queries. Meanwhile, a multi-granularity relation-aware network extracts features from different modalities and explores strong relationships between words and images. Comparative experiments with existing works on three publicly available datasets are constructed to demonstrate the effectiveness and superiority of our proposed network.
科研通智能强力驱动
Strongly Powered by AbleSci AI