计算机科学
变压器
编码器
推论
稳健性(进化)
计算机工程
人工智能
实时计算
工程类
电压
生物化学
化学
电气工程
基因
操作系统
作者
Issa Khalifeh,Luka Murn,Marta Mrak,Ebroul Izquierdo
标识
DOI:10.1109/icip49359.2023.10222296
摘要
Video frame interpolation is an increasingly important research task with several key industrial applications in the video coding, broadcast and production sectors. Recently, transformers have been introduced to the field resulting in substantial performance gains. However, this comes at a cost of greatly increased memory usage, training and inference time. In this paper, a novel method integrating a transformer encoder and convolutional features is proposed. This network reduces the memory burden by close to 50% and runs up to four times faster during inference time compared to existing transformer-based interpolation methods. A dual-encoder architecture is introduced which combines the strength of convolutions in modelling local correlations with those of the transformer for long-range dependencies. Quantitative evaluations are conducted on various benchmarks with complex motion to showcase the robustness of the proposed method, achieving competitive performance compared to state-of-the-art interpolation networks.
科研通智能强力驱动
Strongly Powered by AbleSci AI