嵌入
计算机科学
变压器
抽取
人工智能
快速傅里叶变换
增采样
计算机视觉
算法
图像(数学)
电压
滤波器(信号处理)
物理
量子力学
作者
Yifan Zuo,Wenhao Yao,Yuqi Hu,Yuming Fang,Wei Liu,Yuxin Peng
标识
DOI:10.1109/tip.2024.3444317
摘要
Recently, transformer-based backbones show superior performance over the convolutional counterparts in computer vision. Due to quadratic complexity with respect to the token number in global attention, local attention is always adopted in low-level image processing with linear complexity. However, the limited receptive field is harmful to the performance. In this paper, motivated by Octave convolution, we propose a transformer-based single image super-resolution (SISR) model, which explicitly embeds dynamic frequency decomposition into the standard local transformer. All the frequency components are continuously updated and re-assigned via intra-scale attention and inter-scale interaction, respectively. Specifically, the attention in low resolution is enough for low-frequency features, which not only increases the receptive field, but also decreases the complexity. Compared with the standard local transformer, the proposed FDRTran layer simultaneously decreases FLOPs and parameters. By contrast, Octave convolution only decreases FLOPs of the standard convolution, but keeps the parameter number unchanged. In addition, the restart mechanism is proposed for every a few frequency updates, which first fuses the low and high frequency, then decomposes the features again. In this way, the features can be decomposed in multiple viewpoints by learnable parameters, which avoids the risk of early saturation for frequency representation. Furthermore, based on the FDRTran layer with restart mechanism, the proposed FDRNet is the first transformer backbone for SISR which discusses the Octave design. Sufficient experiments show our model reaches state-of-the-art performance on 6 synthetic and real datasets. The code and the models are available at https://github.com/catnip1029/FDRNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI