行人
编码器
计算机科学
水准点(测量)
人工智能
背景(考古学)
弹道
机器学习
特征(语言学)
人机交互
计算机视觉
工程类
操作系统
生物
天文
物理
哲学
古生物学
语言学
地理
运输工程
大地测量学
作者
Neha Sharma,Chhavi Dhiman,S. Indu
出处
期刊:IEEE Sensors Journal
[Institute of Electrical and Electronics Engineers]
日期:2023-11-15
卷期号:23 (22): 27540-27548
标识
DOI:10.1109/jsen.2023.3317426
摘要
The capability to comprehend the intention of pedestrians on the road is one of the most crucial skills that the current autonomous vehicles (AVs) are striving for, to become fully autonomous. In recent years, multimodal methods have gained traction employing trajectory, appearance, and context for predicting pedestrian crossing intention. However, most existing research works still lag rich feature representational ability in a multimodal scenario, restricting their performance. Moreover, less emphasis is put on pedestrian interactions with the surroundings for predicting short-term pedestrian intention in a challenging ego-centric vision. To address these challenges, an efficient visual–motion–interaction-guided (VMI) intention prediction framework has been proposed. This framework comprises visual encoder (VE), motion encoder (ME), and interaction encoder (IE) to capture rich multimodal features of the pedestrian and its interactions with the surroundings, followed by temporal attention and adaptive fusion (AF) module (AFM) to integrate these multimodal features efficiently. The proposed framework outperforms several SOTA on benchmark datasets: Pedestrian Intention Estimation (PIE)/Joint Attention in Autonomous Driving (JAAD) with accuracy, AUC, ${F}1$ -score, precision, and recall as 0.92/0.89, 0.91/0.90, 0.87/0.81, 0.86/0.79, and 0.88/0.83, respectively. Furthermore, extensive experiments are carried out to investigate different fusion architectures and design parameters of all encoders. The proposed VMI framework predicts pedestrian crossing intention 2.5 s ahead of the crossing event. Code is available at: https://github.com/neha013/VMI.git .
科研通智能强力驱动
Strongly Powered by AbleSci AI