卷积神经网络
计算机科学
现场可编程门阵列
深度学习
特征提取
管道(软件)
硬件加速
并行计算
卷积(计算机科学)
Verilog公司
人工神经网络
人工智能
嵌入式系统
计算机工程
计算科学
计算机硬件
程序设计语言
作者
Ru Ding,Guangda Su,Guoqiang Bai,Wei Xu,Nan Su,Xiaojie Wu
标识
DOI:10.1109/edssc.2019.8754067
摘要
Convolutional Neural Network (CNN) as a typical deep learning model has been widely used to solve many complex problems. However, the computation-intensive convolutional layers and memory-intensive fully connected layers limit the implementation of CNN on embedded platforms. In this paper we proposed a FPGA-based accelerator for face feature extraction, which supports the acceleration of entire CNN. In our design, all the CNN layers are optimized and deployed separately and independently with hand coded Verilog templates instead of basing on high level synthesis (HLS) tool. The RTL-designed layers can use the most optimized parallelism strategy for convolution layer and pipeline structure for convolution layer and pooling layer to achieve high resource utilization. For the fully connected layer, the batch-based method is applied to reduce the number of data access. Moreover, a dynamic fixed-point quantization strategy is adopted to reduce the resource consumption. As a result, a system of “FPGA+ARM” is applied to complete the hardware acceleration of CNN and the precision error is less than 1% compared with software.
科研通智能强力驱动
Strongly Powered by AbleSci AI