计算机科学
运动插值
插值(计算机图形学)
帧(网络)
计算机视觉
运动(物理)
人工智能
计算机图形学(图像)
生成语法
块匹配算法
视频处理
视频跟踪
电信
作者
Yuheng Huang,Jia Xu,Xin Su,Lu Zhang,Xiaomin Li,Qinghe Wang,Huchuan Lu
标识
DOI:10.1007/978-981-97-8792-0_28
摘要
Video Frame Interpolation (VFI) is a challenging task, especially when scenarios involve large motions. Most existing methods are based on optical flow, which is difficult to predict when large motions exist. Additionally, due to their lack of prior image knowledge, they tend to generate intermediate frames with artifacts if the predicted optical flow is wrong. In this paper, we propose a novel method based on a pre-trained latent diffusion model (LDM). Firstly, we freeze most of the parameters to preserve the rich image prior knowledge and powerful generation capabilities of the LDM. Secondly, we inflate our model to handle videos and adopt a multi-scale spatial-temporal attention module to enhance the ability to process large motions. Finally, information from the input frames is utilized to assist in reconstructing details in the output frames, further enhancing the quality of the output frames. The experimental results demonstrate that our method achieves excellent performance in both natural and animated videos with large motions. Specifically, our method achieves state-of-the-art performance on the animated dataset, showcasing remarkable outputs with nearly no artifacts.
科研通智能强力驱动
Strongly Powered by AbleSci AI