计算机科学
人工智能
像素
模式识别(心理学)
计算机视觉
作者
Jiachen Dang,Yong Zhong,Xiaolin Qin
标识
DOI:10.1016/j.cviu.2024.103930
摘要
Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for low-light image enhancement to achieve better lighting, natural colors, and higher contrast. However, the high computational cost of self-attention limits its development in low-light image enhancement, while some works struggle to balance accuracy and computational cost. In this work, we propose a lightweight and effective network based on the proposed pixel-wise and patch-wise cross-attention mechanism, PPformer, for low-light image enhancement. PPformer is a CNN-transformer hybrid network that is divided into three parts: local-branch, global-branch, and Dual Cross-Attention. Each part plays a vital role in PPformer. Specifically, the local-branch extracts local structural information using a stack of Wide Enhancement Modules, and the global-branch provides the refining global information by Cross Patch Module and Global Convolution Module. Besides, different from self-attention, we use extracted global semantic information to guide modeling dependencies between local and non-local. According to calculating Dual Cross-Attention, the PPformer can effectively restore images with better color consistency, natural brightness and contrast. Benefiting from the proposed dual cross-attention mechanism, PPformer effectively captures the dependencies in both pixel and patch levels for a full-size feature map. Extensive experiments on eleven real-world benchmark datasets show that PPformer achieves better quantitative and qualitative results than previous state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI