Sungju Ryu,Hyungjun Kim,Won Ju Yi,Eunhwan Kim,Yulhwa Kim,Taesu Kim,Jae‐Joon Kim
出处
期刊:IEEE Journal of Solid-state Circuits [Institute of Electrical and Electronics Engineers] 日期:2022-06-01卷期号:57 (6): 1924-1935被引量:9
标识
DOI:10.1109/jssc.2022.3141050
摘要
We introduce an area/energy-efficient precision-scalable neural network accelerator architecture. Previous precision-scalable hardware accelerators have limitations such as the under-utilization of multipliers for low bit-width operations and the large area overhead to support various bit precisions. To mitigate the problems, we first propose a bitwise summation, which reduces the area overhead for the bit-width scaling. In addition, we present a channel-wise aligning scheme (CAS) to efficiently fetch inputs and weights from on-chip SRAM buffers and a channel-first and pixel-last tiling (CFPL) scheme to maximize the utilization of multipliers on various kernel sizes. A test chip was implemented in 28-nm CMOS technology, and the experimental results show that the throughput and energy efficiency of our chip are up to 7.7 $\times $ and 1.64 $\times $ higher than those of the state-of-the-art designs, respectively. Moreover, additional 1.5–3.4 $\times $ throughput gains can be achieved using the CFPL method compared to the CAS.