计算机科学
现场可编程门阵列
推论
卷积神经网络
计算
卷积(计算机科学)
计算机工程
并行计算
硬件加速
变压器
高效能源利用
建筑
人工智能
计算科学
嵌入式系统
人工神经网络
算法
物理
量子力学
电压
电气工程
工程类
艺术
视觉艺术
作者
Tianyang Li,Fan Zhang,Xitian Fan,Jianliang Shen,Wei Guo,Wei Cao
标识
DOI:10.1109/iscas46773.2023.10182145
摘要
Many models combining Transformers with convolutional neural networks (CNNs) for computer vision tasks have achieved state-of-the-art results. However, due to the different computation patterns between attention and convolution, using a dedicated Transformer or CNN accelerator will inevitably reduce the computing efficiency of the other. To overcome this problem, we propose a unified architecture for attention and convolution on FPGA. We reduce runtime overhead by offloading part of self-attention computations offline before inference. Furthermore, we present a unified mapping method according to the computing characteristics of attention-based and convolution-based models. This accelerator implements multi-head attention in Transformer, independent ResNet-50 and hybrid blocks of attention and con-volution in BoTNet-50 at 200MHz on Xilinx Virtex Ultrascale+ XCVU37P. Experimental results show that the solution is nearly 3.62 times more energy-efficient than the NVIDIA V100 GPU, and the computational efficiency is 11.86% and 28.29% higher than the state-of-the-art Transformer and ResNet-50 accelerators, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI