计算机科学
卷积神经网络
人工智能
手势
姿势
计算机视觉
利用
分割
过程(计算)
深度学习
符号(数学)
人工神经网络
模式识别(心理学)
手势识别
特征提取
特征(语言学)
语音识别
计算机安全
数学分析
数学
操作系统
作者
Tomas Pfister,Karen Simonyan,James Charles,Andrew Zisserman
标识
DOI:10.1007/978-3-319-16865-4_35
摘要
Our objective is to efficiently and accurately estimate the upper body pose of humans in gesture videos. To this end, we build on the recent successful applications of deep convolutional neural networks (ConvNets). Our novelties are: (i) our method is the first to our knowledge to use ConvNets for estimating human pose in videos; (ii) a new network that exploits temporal information from multiple frames, leading to better performance; (iii) showing that pre-segmenting the foreground of the video improves performance; and (iv) demonstrating that even without foreground segmentations, the network learns to abstract away from the background and can estimate the pose even in the presence of a complex, varying background. We evaluate our method on the BBC TV Signing dataset and show that our pose predictions are significantly better, and an order of magnitude faster to compute, than the state of the art [3].
科研通智能强力驱动
Strongly Powered by AbleSci AI