地标
联想(心理学)
情态动词
对象(语法)
人工智能
计算机视觉
计算机科学
数据关联
地图学
地理
心理学
化学
概率逻辑
高分子化学
心理治疗师
作者
Shigemichi Matsuzaki,Takuma Sugino,Kazuhito Tanaka,Zijun Sha,Shintaro Nakaoka,Shintaro Yoshizawa,Kazuhiro Shintani
出处
期刊:Cornell University - arXiv
日期:2024-02-08
标识
DOI:10.48550/arxiv.2402.06092
摘要
This paper describes a multi-modal data association method for global localization using object-based maps and camera images. In global localization, or relocalization, using object-based maps, existing methods typically resort to matching all possible combinations of detected objects and landmarks with the same object category, followed by inlier extraction using RANSAC or brute-force search. This approach becomes infeasible as the number of landmarks increases due to the exponential growth of correspondence candidates. In this paper, we propose labeling landmarks with natural language descriptions and extracting correspondences based on conceptual similarity with image observations using a Vision Language Model (VLM). By leveraging detailed text information, our approach efficiently extracts correspondences compared to methods using only object categories. Through experiments, we demonstrate that the proposed method enables more accurate global localization with fewer iterations compared to baseline methods, exhibiting its efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI