计算机科学
冗余(工程)
人工智能
注意力网络
卷积(计算机科学)
采样(信号处理)
分割
模棱两可
透视图(图形)
特征(语言学)
模式识别(心理学)
边界(拓扑)
特征提取
计算机视觉
人工神经网络
数学
数学分析
语言学
哲学
滤波器(信号处理)
程序设计语言
操作系统
作者
Zhuben Dong,Yunheng Li,Yiwei Sun,Conghui Hao,Kaiyuan Liu,Tao Sun,Shenglan Liu
标识
DOI:10.1109/icme52920.2022.9859819
摘要
Locating action segments in long untrimmed videos is a sub-task of video understanding, which more and more scholars pay attention to. Boundary ambiguity and over-segmentation errors are two difficult problems. To handle them, we propose a network called Double Attention Network based on Sparse Sampling (DASS) on the basis of MS-TCN series. First, we design a Seq2Seq Convolution Sampling Network (SCSN) to reduce feature redundancy, which also works on over-fitting. Second, we devise a Global Temporal Attention Module (GTAM) to help predict action boundaries and improve the effect of post-processing from a global perspective. Third, we propose Local Temporal Attention Module (LTAM), which both casts attention to local frames and complements details lost in high dilated layers. We perform experiments on three challenging datasets: 50Salads, GTEA and Breakfast and prove our model is state-of-the-art.
科研通智能强力驱动
Strongly Powered by AbleSci AI