现场可编程门阵列
计算机科学
加速
覆盖
计算
变压器
并行计算
嵌入式系统
电气工程
工程类
电压
操作系统
算法
作者
Yueyin Bai,Hao Zhou,Keqing Zhao,Jianli Chen,Jun Yu,Kun Wang
标识
DOI:10.1109/fccm57271.2023.00049
摘要
Existing implementations of transformer networks by field-programmable gate array (FPGA) focus only on attention computation, or suffer from fixed model structure without flexibility. In this article, we propose an FPGA-based overlay processor, named Transformer-OPU for general accelerations of transformer networks. Experimental result shows that our Transformer-OPU achieves 5.19-15.06× and 1.14-2.89× speedup compared with CPU and GPU, respectively. We also observe 1.10-2.47× better latency compared with previously customized FPGA accelerators, and is 1.45× faster than NPE.
科研通智能强力驱动
Strongly Powered by AbleSci AI