现场可编程门阵列
计算机科学
加速度
卷积神经网络
隐藏物
管道(软件)
并行计算
硬件加速
卷积(计算机科学)
计算机硬件
嵌入式系统
人工神经网络
人工智能
经典力学
程序设计语言
物理
作者
Tao Xie,Yingjie Ma,Wenbin Feng,Le Chang,Chongchong Yu
标识
DOI:10.1109/cac53003.2021.9728099
摘要
To solve the problem that the convolutional neural network model is limited by the resources of the embedded platform, this paper proposed and designed a target detection platform accelerated by a heterogeneous chip based on FPGA . The platform uses FPGA heterogeneous chips to accelerate the compressed yOLO v2 model. In the compression process, channel pruning is used to remove the redundant channels in the convolutional neural network model and reduce the model parameters to save the hardware storage resources. The hardware platform adopts Xilinx Pynq-Z2 board and the image is preprocessed by ARM. Then, the processed image data and model parameters are transmitted to FPGA through AXI bus for layer by layer convolution network acceleration. Aiming at the implemented acceleration platform, the implementation mode of nonlinear function operation in FPGA is further studied to optimize the acceleration effect. The experimental results show that the average processing time of each image is 530ms and the average precision is 0.7582 after adding PE split cache data pipeline, which is 20ms faster than that without PE split cache data pipeline. A part of the nonlinear operations is fitted in FPGA by function mapping method or segmented Taylor expansion to achieve greater acceleration.
科研通智能强力驱动
Strongly Powered by AbleSci AI