计算机视觉
人工智能
计算机科学
感知
机器视觉
变压器
工程类
心理学
电气工程
电压
神经科学
作者
Runsheng Xu,Chia-Ju Chen,Zhengzhong Tu,Ming–Hsuan Yang
标识
DOI:10.1109/tpami.2024.3479222
摘要
In this paper, we study the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles. We present V2X-ViTs, a robust cooperative perception framework with V2X communication using novel vision Transformer models. First, we present V2X-ViTv1 containing holistic attention modules that can effectively fuse information across on-road agents (i.e., vehicles and infrastructure). Specifically, V2X-ViTv1 consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention, which captures inter-agent interaction and per-agent spatial relationships. These key modules are designed in a unified Transformer architecture to handle common V2X challenges, including asynchronous information sharing, pose errors, and heterogeneity of V2X components. Second, we propose an advanced architecture, V2X-ViTv2, that enjoys increased ability for multi-scale perception. We also propose advanced data augmentation techniques tailored for V2X applications to improve performance. We construct a large-scale V2X perception dataset using CARLA and OpenCDA to validate our approach. Extensive experimental results on both synthetic and real-world datasets show that V2X-ViTs achieve state-of-the-art performance for 3D object detection and are robust even under harsh, noisy environments. All the code and trained models will be available at https://github.com/DerrickXuNu/OpenCOOD.
科研通智能强力驱动
Strongly Powered by AbleSci AI