自动汇总
计算机科学
人工智能
图形
注意力网络
弹丸
序列(生物学)
卷积神经网络
发电机(电路理论)
循环神经网络
模式识别(心理学)
帧(网络)
数据挖掘
计算机视觉
人工神经网络
理论计算机科学
功率(物理)
化学
物理
有机化学
量子力学
生物
遗传学
电信
作者
Bin Zhao,Haopeng Li,Xiaoqiang Lu,Xuelong Li
标识
DOI:10.1109/tpami.2021.3072117
摘要
Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization. Current approaches mainly devote to modeling the video as a frame sequence by recurrent neural networks. However, one potential limitation of the sequence models is that they focus on capturing local neighborhood dependencies while the high-order dependencies in long distance are not fully exploited. In general, the frames in each shot record a certain activity and vary smoothly over time, but the multi-hop relationships occur frequently among shots. In this case, both the local and global dependencies are important for understanding the video content. Motivated by this point, we propose a reconstructive sequence-graph network (RSGN) to encode the frames and shots as sequence and graph hierarchically, where the frame-level dependencies are encoded by long short-term memory (LSTM), and the shot-level dependencies are captured by the graph convolutional network (GCN). Then, the videos are summarized by exploiting both the local and global dependencies among shots. Besides, a reconstructor is developed to reward the summary generator, so that the generator can be optimized in an unsupervised manner, which can avert the lack of annotated data in video summarization. Furthermore, under the guidance of reconstruction loss, the predicted summary can better preserve the main video content and shot-level dependencies. Practically, the experimental results on three popular datasets (i.e., SumMe, TVsum and VTW) have demonstrated the superiority of our proposed approach to the summarization task.
科研通智能强力驱动
Strongly Powered by AbleSci AI