吞吐量
人工神经网络
计算机硬件
控制重构
计算机体系结构
作者
Ephrem C. Wu,Zhang Xiaoqian,Berman David,Inkeun Cho
出处
期刊:Field-Programmable Logic and Applications
日期:2017-09-01
卷期号:: 1-4
被引量:20
标识
DOI:10.23919/fpl.2017.8056794
摘要
FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).
科研通智能强力驱动
Strongly Powered by AbleSci AI