计算机科学                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            计算机视觉                        
                
                                
                        
                            网格                        
                
                                
                        
                            感知                        
                
                                
                        
                            成对比较                        
                
                                
                        
                            特征学习                        
                
                                
                        
                            分割                        
                
                                
                        
                            编码(集合论)                        
                
                                
                        
                            集合(抽象数据类型)                        
                
                                
                        
                            地理                        
                
                                
                        
                            大地测量学                        
                
                                
                        
                            生物                        
                
                                
                        
                            神经科学                        
                
                                
                        
                            程序设计语言                        
                
                        
                    
            作者
            
                Zhiqi Li,Wenhai Wang,Hongyang Li,Enze Xie,Chonghao Sima,Tong Lu,Yu Qiao,Jifeng Dai            
         
                    
            出处
            
                                    期刊:Cornell University - arXiv
                                                                        日期:2022-01-01
                                                                        被引量:1
                                
         
        
    
            
            标识
            
                                    DOI:10.48550/arxiv.2203.17270
                                    
                                
                                 
         
        
                
            摘要
            
            3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes \texttt{test} set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code is available at \url{https://github.com/zhiqi-li/BEVFormer}.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI