Cross-domain Detection Transformer based on Spatial-aware and Semantic-aware Token Alignment

计算机科学 安全性令牌 变压器 人工智能 目标检测 编码器 鉴别器 数据挖掘 机器学习 模式识别(心理学) 计算机网络 电信 物理 量子力学 电压 探测器 操作系统
作者
Jinhong Deng,Xiaoyue Zhang,Wen Li,Lixin Duan,Dong Xu
出处
期刊:IEEE Transactions on Multimedia [Institute of Electrical and Electronics Engineers]
卷期号:: 1-12 被引量:1
标识
DOI:10.1109/tmm.2023.3330524
摘要

Detection transformers such as DETR [1] have recently exhibited promising performance for many object detection tasks, but the generalization ability of those methods is still quite limited for cross-domain adaptation scenarios. To address the cross-domain issue, a straightforward method is to perform token alignment with adversarial training in transformers. However, its performance is often unsatisfactory because the tokens in detection transformers are quite diverse and represent different spatial and semantic information. In this paper, we propose a new method for cross-domain detection transformers called spatial-aware and semantic-aware token alignment (SSTA). Specifically, we take advantage of the characteristics of cross-attention as used in the detection transformer and propose spatial-aware token alignment (SpaTA) and semantic-aware token alignment (SemTA) strategies to guide the token alignment across domains. For spatial-aware token alignment, we extract the information from the cross-attention map (CAM) to align the distribution of tokens according to their attention to object queries. For semantic-aware token alignment, we inject the category information into the cross-attention map and construct domain embedding to guide the learning of a multi-class discriminator to model the category relationship and achieve category-level token alignment during the entire adaptation process. We conduct extensive experiments on several widely-used benchmarks, and the results clearly show the effectiveness of our proposed approach over existing state-of-the-art methods.

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
5秒前
8秒前
9秒前
梅赛德斯奔驰完成签到,获得积分10
11秒前
12秒前
NEMO发布了新的文献求助10
13秒前
拾捌发布了新的文献求助10
16秒前
薛凌云完成签到 ,获得积分10
17秒前
罗格朗因完成签到 ,获得积分10
21秒前
热舞特完成签到,获得积分10
23秒前
小饼一定要上岸完成签到 ,获得积分10
24秒前
科目三应助NEMO采纳,获得10
28秒前
31秒前
31秒前
LYNB完成签到 ,获得积分10
33秒前
34秒前
风雨潇湘应助健壮的悟空采纳,获得10
35秒前
35秒前
俏皮的龙猫完成签到 ,获得积分10
37秒前
jjyy发布了新的文献求助10
37秒前
周至发布了新的文献求助10
37秒前
lyf发布了新的文献求助10
39秒前
Foalphaz发布了新的文献求助10
40秒前
lyf完成签到,获得积分10
46秒前
参商完成签到 ,获得积分10
46秒前
47秒前
李爱国应助贪玩的无招采纳,获得10
51秒前
shan发布了新的文献求助10
53秒前
57秒前
57秒前
57秒前
57秒前
57秒前
57秒前
57秒前
57秒前
57秒前
xy9500应助科研通管家采纳,获得10
59秒前
大模型应助科研通管家采纳,获得10
59秒前
高分求助中
Yangtze Reminiscences. Some Notes And Recollections Of Service With The China Navigation Company Ltd., 1925-1939 800
Common Foundations of American and East Asian Modernisation: From Alexander Hamilton to Junichero Koizumi 600
Signals, Systems, and Signal Processing 510
Discrete-Time Signals and Systems 510
T/SNFSOC 0002—2025 独居石精矿碱法冶炼工艺技术标准 300
The Impact of Lease Accounting Standards on Lending and Investment Decisions 250
The Linearization Handbook for MILP Optimization: Modeling Tricks and Patterns for Practitioners (MILP Optimization Handbooks) 200
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5852158
求助须知:如何正确求助?哪些是违规求助? 6276424
关于积分的说明 15627774
捐赠科研通 4968051
什么是DOI,文献DOI怎么找? 2678889
邀请新用户注册赠送积分活动 1623146
关于科研通互助平台的介绍 1579507