变压器
计算机科学
人工智能
解析
模式识别(心理学)
代表(政治)
背包问题
建筑
计算机视觉
算法
工程类
电压
艺术
视觉艺术
电气工程
政治
法学
政治学
作者
Kuan Zhu,Haiyun Guo,Shiliang Zhang,Yaowei Wang,Jing Liu,Jinqiao Wang,Jing Liu
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2023-08-25
卷期号:: 1-11
被引量:73
标识
DOI:10.1109/tnnls.2023.3301856
摘要
In person re-identification (re-ID), extracting part-level features from person images has been verified to be crucial to offer fine-grained information. Most of the existing CNN-based methods only locate the human parts coarsely, or rely on pretrained human parsing models and fail in locating the identifiable nonhuman parts (e.g., knapsack). In this article, we introduce an alignment scheme in transformer architecture for the first time and propose the auto-aligned transformer (AAformer) to automatically locate both the human parts and nonhuman ones at patch level. We introduce the "Part tokens (PARTs)", which are learnable vectors, to extract part features in the transformer. A PART only interacts with a local subset of patches in self-attention and learns to be the part representation. To adaptively group the image patches into different subsets, we design the auto-alignment. Auto-alignment employs a fast variant of optimal transport (OT) algorithm to online cluster the patch embeddings into several groups with the PARTs as their prototypes. AAformer integrates the part alignment into the self-attention and the output PARTs can be directly used as part features for retrieval. Extensive experiments validate the effectiveness of PARTs and the superiority of AAformer over various state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI