CTVSR: Collaborative Spatial–Temporal Transformer for Video Super-Resolution

计算机科学安全性令牌人工智能时间分辨率图像分辨率变压器保险丝（电气）计算机视觉模式识别（心理学）计算机安全量子力学电气工程物理工程类电压

作者

Jun Tang,Chen-Yan Lu,zhoufeng liu,Jiale Li,Hang Dai,Yong Ding

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2023-12-07 卷期号：34 (6): 5018-5032 被引量：1

标识

DOI：10.1109/tcsvt.2023.3340439

摘要

Video super-resolution (VSR) is important in video processing for reconstructing high-definition image sequences from corresponding continuous and highly-related video frames. However, existing VSR methods have limitations in fusing spatial-temporal information. Some methods only fuse spatial-temporal information on a limited range of total input sequences, while others adopt a recurrent strategy that gradually attenuates the spatial information. While recent advances in VSR utilize Transformer-based methods to improve the quality of the upscaled videos, these methods require significant computational resources to model the long-range dependencies, which dramatically increases the model complexity. To address these issues, we propose a Collaborative Transformer for Video Super-Resolution (CTVSR). The proposed method integrates the strengths of Transformer-based and recurrent-based models by concurrently assimilating the spatial information derived from multi-scale receptive fields and the temporal information acquired from temporal trajectories. In particular, we propose a Spatial Enhanced Network (SEN) with two key components: Token Dropout Attention (TDA) and Deformable Multi-head Cross Attention (DMCA). TDA focuses on the key regions to extract more informative features, and DMCA employs deformable cross attention to gather information from adjacent frames. Moreover, we introduce a Temporal-trajectory Enhanced Network (TEN) that computes the similarity of a given token with temporally-related tokens in the temporal trajectory, which is different from previous methods that evaluate all tokens within the temporal dimension. With comprehensive quantitative and qualitative experiments on four widely-used VSR benchmarks, the proposed CTVSR achieves competitive performance with relatively low computational consumption and high forward speed.

求助该文献

最长约 10秒，即可获得该文献文件

CTVSR: Collaborative Spatial–Temporal Transformer for Video Super-Resolution

今日热心研友