修补
计算机科学
人工智能
变压器
卷积神经网络
计算复杂性理论
水准点(测量)
图像(数学)
图像分辨率
二次方程
面子(社会学概念)
模式识别(心理学)
计算机视觉
算法
数学
社会科学
物理
几何学
大地测量学
量子力学
电压
社会学
地理
作者
Yifan Deng,Le Wang,Sanping Zhou,Deyu Meng,Jinjun Wang
标识
DOI:10.1145/3503161.3548446
摘要
Benefiting from powerful convolutional neural networks (CNNs), learning-based image inpainting methods have made significant breakthroughs over the years. However, some nature of CNNs (e.g. local prior, spatially shared parameters) limit the performance in the face of broken images with diverse and complex forms. Recently, a class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields and high-level vision tasks. Compared with CNNs, attention operators are better at long-range modeling and have dynamic weights, but their computational complexity is quadratic in spatial resolution, and thus less suitable for applications involving higher resolution images, such as image inpainting. In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion. And based on this attention, a network called T-former is designed for image inpainting. Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
科研通智能强力驱动
Strongly Powered by AbleSci AI