计算机科学
深度学习
推论
现场可编程门阵列
量化(信号处理)
人工智能
硬件加速
人工神经网络
浮点型
机器学习
计算机工程
嵌入式系统
计算机视觉
算法
标识
DOI:10.1109/paap54281.2021.9720468
摘要
In the past few decades, with the large-scale application of deep learning technology, the neural network inference speed problem is becoming more and more severe, especially in some mobile and embedded real-time processing systems. To improve the inference speed, some software optimization techniques have been proposed by previous researchers. However, these methods only focused on the model self and still cannot meet the rapid development of large and complex deep learning models. Fortunately, deep neural network (DNN) accelerators based on FPGA SoC has opened a promising opportunity for the real-time inference. In this paper, we proposed a novel 16-bit dynamic fixed-point number quantization method to map the object detection network YOLOv4-tiny into FPGA-based heterogeneous deep learning accelerators. We evaluate this model in Xilinx Zynq-7020 SoC on the ZedBoard platform. Experiment on coco dataset shows that this approach can improve the model inference speed to 4× than pure CPU platform, and the accuracy loss is negligible with only 3-5%. This mapping method can be competitive with or even superior to other state-of-the-art accelerating method in deep learning fields.
科研通智能强力驱动
Strongly Powered by AbleSci AI