Ang Feng,Ruiqi Qiu,Jinglong Wang,Jun Gong,Yang Yi,Mingtao Dong
出处
期刊:IEEE robotics and automation letters日期:2024-03-01卷期号:9 (3): 2224-2231
标识
DOI:10.1109/lra.2024.3351002
摘要
Pedestrian's future trajectory prediction is a key challenge in ego-centric view of autonomous driving system. Most of the current methods are flawed in capturing subtle change features in a lightweight model size. To solve this problem, we propose a multimodal forward generation transformer network based on encoder-decoder structure. Different from the traditional transformer, we improve layer normalization and propose frame normalization, which can more successfully capture minute time-variant properties. In addition, we believe that considering short-term pedestrian's future goals can help the ego-vehicle to predict more accurate and reasonable long-term pedestrians' trajectory. Therefore, based on the idea of forward generation, the decoder considers the future short-term targets and uses trajectory-time correlation module to capture the relationship between estimated short-term future goals and global spatial-temporal context cues of the historical trajectory. Our model is evaluated on JAAD and PIE datasets and achieves state-of-the-art performance while maintaining a lightweight model size.