计算机科学
人类多任务处理
编码器
感知
建筑
视觉感受
人机交互
计算机视觉
人工智能
实时计算
认知心理学
操作系统
神经科学
心理学
艺术
视觉艺术
作者
Muhammad Usman,Zaka-Ud-Din Muhammad,Qiang Ling
标识
DOI:10.1016/j.eswa.2024.123249
摘要
Visual perception plays a vital role in autonomous driving systems, demanding high accuracy and real-time inference speed to ensure safety. In this paper, we propose a multi-task framework that simultaneously performs object detection, drivable area segmentation, and lane line identification, addressing the requirements of accurate and efficient visual perception. Our approach utilizes a shared-encoder architecture with three separate decoders, targeting each specific task. We investigate three configurations for the shared encoder: a Convolutional Neural Network (CNN), a Polyp Vision Transformer (PVT), and a hybrid CNN+PVT model. Through extensive experimentation and comparative analysis on the challenging BD100K dataset, we evaluate the performance of these shared-encoder models and provide valuable insights into their strengths and weaknesses. Our research contributes to the advancement of multi-task visual perception for autonomous driving systems by achieving competitive results in terms of accuracy and efficiency. The source code is publicly available on GitHub to facilitate further research in this domain.
科研通智能强力驱动
Strongly Powered by AbleSci AI