预测(人工智能)
计算机科学
动作(物理)
人工智能
机器学习
动作识别
图形
理论计算机科学
量子力学
物理
班级(哲学)
作者
Yuqi Zhang,Xiucheng Li,Hao Xie,Weijun Zhuang,Shihui Guo,Zhijun Li
出处
期刊:IEEE transactions on image processing
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:33: 3242-3255
标识
DOI:10.1109/tip.2024.3391692
摘要
With human action anticipation becoming an essential tool for many practical applications, there has been an increasing trend in developing more accurate anticipation models in recent years. Most of the existing methods target standard action anticipation datasets, in which they could produce promising results by learning action-level contextual patterns. However, the over-simplified scenarios of standard datasets often do not hold in reality, which hinders them from being applied to real-world applications. To address this, we propose a scene-graph-based novel model SEAD that learns the action anticipation at the high semantic level rather than focusing on the action level. The proposed model is composed of two main modules, 1) the scene prediction module, which predicts future scene graphs using a grammar dictionary, and 2) the action anticipation module, which is responsible for predicting future actions with an LSTM network by taking as input the observed and predicted scene graphs. We evaluate our model on two real-world video datasets (Charades and Home Action Genome) as well as a standard action anticipation dataset (CAD-120) to verify its efficacy. The experimental results show that SEAD is able to outperform existing methods by large margins on the two real-world datasets and can also yield stable predictions on the standard dataset at the same time. In particular, our proposed model surpasses the state-of-the-art methods with mean average precision improvements consistently higher than 65% on the Charades dataset and an average improvement of 40.6% on the Home Action Genome dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI