计算机科学
人工智能
特征(语言学)
光学(聚焦)
深度学习
任务(项目管理)
模式识别(心理学)
相关性
图层(电子)
特征提取
机器学习
视觉注意
动作(物理)
感知
哲学
物理
经济
神经科学
有机化学
化学
管理
光学
生物
量子力学
语言学
数学
几何学
作者
Cheng Dai,Xingang Liu,Jinfeng Lai
标识
DOI:10.1016/j.asoc.2019.105820
摘要
It is well known that different frames play different roles in feature learning in video based human action recognition task. However, most existing deep learning models put the same weights on different visual and temporal cues in the parameter training stage, which severely affects the feature distinction determination. To address this problem, this paper utilizes the visual attention mechanism and proposes an end-to-end two-stream attention based LSTM network. It can selectively focus on the effective features for the original input images and pay different levels of attentions to the outputs of each deep feature maps. Moreover, considering the correlation between two deep feature streams, a deep feature correlation layer is proposed to adjust the deep learning network parameter based on the correlation judgement. In the end, we evaluate our approach on three different datasets, and the experiments results show that our proposal can achieve the state-of-the-art performance in the common scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI