计算机科学
现场可编程门阵列
边缘设备
卷积神经网络
计算机硬件
嵌入式系统
计算机工程
并行计算
人工智能
云计算
操作系统
作者
Zhichao Zhang,M. A. Parvez Mahmud,Abbas Z. Kouzani
出处
期刊:IEEE Internet of Things Journal
[Institute of Electrical and Electronics Engineers]
日期:2022-05-30
卷期号:9 (21): 21357-21369
被引量:23
标识
DOI:10.1109/jiot.2022.3179016
摘要
Executing deep neural networks (DNNs) on resource-constraint edge devices, such as drones, offers low inference latency, high data privacy, and reduced network traffic. However, deploying DNNs on such devices is a challenging task. During DNN inference, intermediate results require significant data movement and frequent off-chip memory (DRAM) access, which decreases the inference speed and power efficiency. To address this issue, this article presents a field-programmable gate array (FPGA)-based convolutional neural network (CNN) accelerator, named FitNN, which improves the speed and power efficiency of CNN inference by reducing data movements. FitNN adopts a pretrained CNN of iSmart2, which is composed of depthwise and pointwise blocks in the Mobilenet structure. A cross-layer dataflow strategy is proposed to reduce off-chip data transfer of feature maps. Also, multilevel buffers are proposed to keep the most needed data on-chip (in block RAM) and avoid off-chip data reorganization and reloading. Finally, a computation core is proposed to operate the depthwise, pointwise, and max-pooling computation as soon as the data arrive without reorganization, which suits the real-life scenario of the data arriving in sequence. In our experiment, FitNN is implemented on two FPGA-based platforms (both at 150 MHz), Ultra96-V2 and PYNQ-Z1, for drone-based object detection with batch size = 1. The results show that FitNN achieves 15 frames per second (FPS) on Ultra96-V2, with power consumption of 4.69 W. On PYNQ-Z1, FitNN achieves 9 FPS with 1.9 W of power consumption. Compared with the previous FPGA-based implementation of iSmart2 CNN, FitNN increases the efficiency (FPS/W) by 2.37 times.
科研通智能强力驱动
Strongly Powered by AbleSci AI