安全性令牌
计算机科学
联营
计算复杂性理论
变压器
计算
二次方程
人工智能
生成语法
算法
计算机工程
模式识别(心理学)
数学
电压
工程类
计算机安全
电气工程
几何学
作者
Tianguang Zhang,Wei Zhang,Zheng Zhang,Yan Gan
标识
DOI:10.1016/j.patrec.2023.04.013
摘要
Recently, the Transformers have shown great potential in computer vision tasks, such as classification detection, segmentation, and image synthesis, etc. The success of Transformers has been long attributed to the attention-based token mixer. However, the computational complexity of the attention-based token mixer module is quadratic to the number of tokens to be mixed. Therefore, the attention-based token mixer module requires more parameters and will cause a very large amount of computation. As far as image synthesis task is concerned, the attention-based token mixer module increases the computation amount of generative adversarial networks (GANs) based on Transformers. To address this problem, we propose the PFGAN method. The motivation is based on our observation that the computational complexity of pooling is linear to the sequence length, without any other learnable parameters. Based on this observation, we use pooling rather than self-attention as the token mixer. Experimental results on CelebA, CIFAR-10 and LSUN datasets demonstrate that our proposed method has fewer parameters and fewer computational complexity.
科研通智能强力驱动
Strongly Powered by AbleSci AI