计算机科学
人工智能
计算机视觉
基本事实
稳健性(进化)
变压器
视觉里程计
姿势
里程计
移动机器人
机器人
生物化学
化学
物理
量子力学
电压
基因
作者
Zhuoyuan Wu,Jun Cai,Ranran Huang,X. Liu,Zhenhua Chai
标识
DOI:10.1007/978-981-99-8076-5_10
摘要
Visual relocalization is a crucial technique used in visual odometry and SLAM to predict the 6-DoF camera pose of a query image. Existing works mainly focus on ground view in indoor or outdoor scenes. However, camera relocalization on unmanned aerial vehicles is less focused. Also, frequent view changes and a large depth of view make it more challenging. In this work, we establish a Bird's-Eye-View (BEV) dataset for camera relocalization, a large dataset contains four distinct scenes (roof, farmland, bare ground, and urban area) with such challenging problems as frequent view changing, repetitive or weak textures and large depths of fields. All images in the dataset are associated with a ground-truth camera pose. The BEV dataset contains 177242 images, a challenging large-scale dataset for camera relocalization. We also propose a Progressive Temporal transFormer (dubbed as PTFormer) as the baseline model. PTFormer is a sequence-based transformer with a designed progressive temporal aggregation module for temporal correlation exploitation and a parallel absolute and relative prediction head for implicitly modeling the temporal constraint. Thorough experiments are exhibited on both the BEV dataset and widely used handheld datasets of 7Scenes and Cambridge Landmarks to prove the robustness of our proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI