计算机科学
变压器
判别式
双层
编码
人工智能
模式识别(心理学)
特征提取
图层(电子)
电压
工程类
生物化学
化学
有机化学
电气工程
基因
作者
Ruyi Ji,Jiaying Li,Libo Zhang,Jing Liu,Yanjun Wu
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2023-09-01
卷期号:33 (9): 5009-5021
被引量:3
标识
DOI:10.1109/tcsvt.2023.3248791
摘要
Fine-grained visual classification requires distinguishing sub-categories within the same super-category, which suffers from small inter-class and large intra-class variances. This paper aims to improve the FGVC task towards better performance, for which we deliver a novel dual Transformer framework (coined Dual-TR) with multi-grained assembly. The Dual-TR is well-designed to encode fine-grained objects by two parallel hierarchies, which is amenable to capturing the subtle yet discriminative cues via the self-attention mechanism in ViT. Specifically, we perform orthogonal multi-grained assembly within the Transformer structure for a more robust representation, i.e., intra-layer and inter-layer assembly. The former aims to explore the informative feature in various self-attention heads within the Transformer layer. The latter pays attention to the token assembly across Transformer layers. Meanwhile, we introduce the constraint of center loss to pull intra-class samples’ compactness and push that of inter-class samples. Extensive experiments show that Dual-TR performs on par with the state-of-the-art methods on four public benchmarks, including CUB-200-2011, NABirds, iNaturalist2017, and Stanford Dogs. The comprehensive ablation studies further demonstrate the effectiveness of architectural design choices.
科研通智能强力驱动
Strongly Powered by AbleSci AI