计算机科学
增采样
人工智能
分割
联营
卷积神经网络
分类器(UML)
模式识别(心理学)
编码器
动作识别
循环神经网络
深度学习
机器学习
班级(哲学)
人工神经网络
图像(数学)
操作系统
作者
C. H. Lea,M. D. Flynn,René Vidal,Austin Reiter,Gregory D. Hager
出处
期刊:Cornell University - arXiv
日期:2016-01-01
被引量:6
标识
DOI:10.48550/arxiv.1611.05267
摘要
The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.
科研通智能强力驱动
Strongly Powered by AbleSci AI