From query to prompt : towards open-world perception
计算机科学
跳跃式监视
杠杆(统计)
情报检索
感知
查询扩展
人工智能
数据挖掘
生物
神经科学
作者
Hao Zhang
标识
DOI:10.14711/thesis-991013340355303412
摘要
The majority of contemporary perception models leverage Transformer-based architectures, such as DETR for object detection and Mask2Former for image segmentation. Central to these frameworks is the concept of extracting objects from image features through the formulation of queries, underscoring the significance of query design. In this dissertation, we explore integrating locality priors into the global attention mechanism via innovative query designs in DN-DETR and DINO. These designs include: 1. conceptualizing queries as anchor boxes; 2. predicting relative object locations across each decoder layer; 3. an auxiliary denoising task that refines queries to be close to object bounding boxes; and 4. strategically initializing queries coupled with a selection process. These advancements...[ Read more ]