现场可编程门阵列
变压器
计算机科学
编码器
嵌入式系统
计算机硬件
计算机体系结构
电气工程
工程类
电压
操作系统
标识
DOI:10.1109/fpl60245.2023.00012
摘要
The transformer neural networks have achieved remarkable performance in both Natural Language Processing (NLP) and Computer Vision (CV) applications, with encoder-decoder architecture based on attention layers. However, implementing transformers on resource-constrained devices presents challenges due to the super-large network structures and nontrivial dataflows. Field-Programmable Gate Arrays (FPGA) have been a promising platform for Neural Network (NN) acceleration due to their design flexibility and customization. Existing FPGA-based implementations of transformers face efficiency and generality issues. This paper proposes HPTA, a high-performance accelerator for implementing transformers on FPGA. We analyze the structural features of transformer networks and design the accelerator with configurable processing element, optimized data selection and arrangement and efficient memory subsystem, to support various transformers. We evaluate the performance of HPTA with BERT and Swin Transformer, the typical transformer models in NLP and CV. HPTA achieves up to 44× and 29× inference time reductions compared with the CPU implementation, and up to 17× and 10x energy efficiency improvements compared with the GPU implementation, for BERT and Swin Transformer, respectively. Compared to the existing FPGA-based accelerators, HPTA shows performance improvements up to 1.3× and 1.8× in inference time compared to NPE and Vis-TOP, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI