现场可编程门阵列
计算机科学
建筑
加速度
深度学习
计算机体系结构
嵌入式系统
并行计算
人工智能
经典力学
物理
艺术
视觉艺术
作者
Yonggen Li,Xin Li,Haibin Shen,Jicong Fan,Yanfeng Xu,Kejie Huang
出处
期刊:ACM Transactions on Reconfigurable Technology and Systems
[Association for Computing Machinery]
日期:2024-01-15
卷期号:17 (1): 1-27
摘要
Field Programmable Gate Array (FPGA) is a versatile and programmable hardware platform, which makes it a promising candidate for accelerating Deep Neural Networks (DNNs). However, FPGA’s computing energy efficiency is low due to the domination of energy consumption by interconnect data movement. In this article, we propose an all-digital Compute-in-memory FPGA architecture for deep learning acceleration. Furthermore, we present a bit-serial computing circuit of the Digital CIM core for accelerating vector-matrix multiplication (VMM) operations. A Network-CIM-deployer ( NCIMD ) is also developed to support automatic deployment and mapping of DNN networks. NCIMD provides a user-friendly API of DNN models in Caffe format. Meanwhile, we introduce a Weight-stationary dataflow and describe the method of mapping a single layer of the network to the CIM array in the architecture. We conduct experimental tests on the proposed FPGA architecture in the field of Deep Learning (DL), as well as in non-DL fields, using different architectural layouts and mapping strategies. We also compare the results with the conventional FPGA architecture. The experimental results show that compared to the conventional FPGA architecture, the energy efficiency can achieve a maximum speedup of 16.1×, while the latency can decrease up to 40% in our proposed CIM FPGA architecture.
科研通智能强力驱动
Strongly Powered by AbleSci AI