行人
弹道
计算机科学
人工智能
工程类
运输工程
物理
天文
作者
Yijun Wang,Zhiqiang Guo,Chang Xu,Jianxin Lin
标识
DOI:10.1016/j.knosys.2024.112038
摘要
Pedestrian trajectory prediction from the first-person view has still been considered one of the challenging problems in automatic driving due to the difficulty of understanding and predicting pedestrian actions. Observing that pedestrian motion naturally contains the repetitive pattern of the gait cycle and global intention information, we design a Multimodal Stepwise-Coordinating Network, namely MSCN, to sufficiently leverage the underlying human motion properties. Specifically, we first design a multimodal spatial-frequency encoder, which encodes the periodicity of pedestrian motion with a frequency-domain enhanced Transformer and other visual information with a spatial-domain Transformer. Then, we propose a stepwise-coordinating decoder structure, which leverages both local and global information in sequence decoding through a two-stage decoding process. After generating a coarse sequence from the stepwise trajectory predictor, we design a coordinator to aggregate the corresponding representations used to generate the coarse sequence. Subsequently, the coordinator learns to output a refined sequence through a knowledge distillation process based on the aggregated representations. In this way, MSCN can adequately capture the representations of short-term motion behaviors, thus modeling better long-term sequence prediction. Extensive experiments show that the proposed model can achieve significant improvements over state-of-the-art approaches on the PIE and JAAD datasets by 16.1% and 16.4% respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI