计算机科学
管道(软件)
分割
目标检测
人工智能
对象(语法)
基础(证据)
计算机视觉
弹丸
编码(集合论)
程序设计语言
历史
考古
集合(抽象数据类型)
有机化学
化学
作者
Dingyuan Zhang,Dingkang Liang,Hsiao‐Shan Yang,Zhikang Zou,Xiaoqing Ye,Zhe Liu,Xiang Bai
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:1
标识
DOI:10.48550/arxiv.2306.02245
摘要
With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.
科研通智能强力驱动
Strongly Powered by AbleSci AI