计算机科学
图像分辨率
人工智能
面子(社会学概念)
变压器
计算机视觉
超分辨率
模式识别(心理学)
图像(数学)
电压
社会科学
量子力学
物理
社会学
作者
Qiqi Bao,Yunmeng Liu,Bowen Gang,Wenming Yang,Qingmin Liao
标识
DOI:10.1109/tmm.2023.3238522
摘要
Numerous CNN-based algorithms have been proposed to reconstruct high-quality face images. However, the inability of convolution operation to model long-distance relationships limits the performance of the CNN-based methods. Moreover, in the high-resolution (HR) image reconstruction stage, with the well decoded feature representations, more efficient architecture design can be explored to synthesize pixel-level image details. In this work, we propose a spatial attention-guided CNN-Transformer aggregation network (SCTANet) for face image super-resolution (FSR) tasks. The core component in the deep feature extraction stage is the Hybrid Attention Aggregation (HAA) block. The HAA block has two parallel paths, one for the Residual Spatial Attention (RSA) block, the other for the Multi-scale Patch embedding and Spatial-attention Masked Transformer (MPSMT) block. The HAA block combines the strengths of CNN and transformer to effectively exploit both local and global information. For the reconstruction stage, we propose to use the Sub-pixel MLP-based Upsampling (SMU) module instead of the conventional CNN architecture. The SMU module promotes the reconstruction of pixel-level image details and reduces computational complexity. Extensive experiments on both synthetic and real-world face datasets demonstrate the superiority of our proposed SCTANet over state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI