激光雷达
计算机科学
水准点(测量)
计算机视觉
人工智能
目标检测
保险丝(电气)
模态(人机交互)
管道(软件)
转化(遗传学)
对象(语法)
编码(集合论)
探测器
模式识别(心理学)
遥感
电信
生物化学
化学
大地测量学
工程类
集合(抽象数据类型)
地质学
电气工程
基因
程序设计语言
地理
作者
Yichen Xie,Chenfeng Xu,Marie‐Julie Rakotosaona,Patrick Rim,Federico Tombari,Kurt Keutzer,Masayoshi Tomizuka,Wei Zhan
标识
DOI:10.1109/iccv51070.2023.01613
摘要
By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI