现场可编程门阵列
吞吐量
计算机科学
卷积神经网络
人工神经网络
收缩阵列
特征(语言学)
能源消耗
计算科学
计算机硬件
嵌入式系统
并行计算
人工智能
工程类
超大规模集成
无线
电信
语言学
哲学
电气工程
作者
Mingqiang Huang,Yucen Liu,Quan Cheng,Shuxin Yang,Kai Li,Junyi Luo,Zhengke Yang,Qiufeng Li,Hao Yu,Changhai Man
标识
DOI:10.1145/3490422.3502343
摘要
Neural architecture search (NAS) optimized multi-bit-width convolutional neural network (CNN) maintains the balance between network performance and efficiency, thus enlightening a promising method for accurate yet energy-efficient edge computing. In this work, we propose a high throughput three-dimensional (3D) systolic accelerator for NAS optimized CNNs, in which the input feature matrix, weight matrix and output feature matrix are delivering vertically, horizontally and perpendicularly through the systolic array respectively. With 3D systolic data flow, the processing time and logic resources consumption can be both reduced compared to the classical non-stationary systolic array. Besides, Booth-based multi-bit-width (INT2/4/8) multiply-add-accumulation (MAC) unit is developed within the 3D systolic accelerator. Deployed on FPGA platform Xilinx ZCU102, peek performance of the convolutional layer can reach as high as 2775 GOPS for INT2, 1650 GOPS for INT4, and 816 GOPS for INT8 respectively. The average performance on accelerating full NAS VGG16 network is 647 GOPS.
科研通智能强力驱动
Strongly Powered by AbleSci AI