Massimo Giordano,Giorgio Cristiano,Koji Ishibashi,Stefano Ambrogio,Hsinyu Tsai,Geoffrey W. Burr,Pritish Narayanan
出处
期刊:IEEE Journal on Emerging and Selected Topics in Circuits and Systems [Institute of Electrical and Electronics Engineers] 日期:2019-04-16卷期号:9 (2): 367-376被引量:24
标识
DOI:10.1109/jetcas.2019.2911537
摘要
Hardware acceleration of deep neural networks (DNNs) using non-volatile memory arrays has the potential to achieve orders of magnitude power and performance benefits versus digital von-Neumann architectures by implementing the critical multiply-accumulate operations at the location of the weight data. However, realizing these system-level improvements requires careful consideration of the circuit design tradeoffs involved. For instance, neuron circuitry at the periphery, in addition to accumulating current and having mechanisms for routing, must also implement a non-linear activation function (for forward propagate) or a derivative (for reverse propagate). While it is possible to do this with analog-to-digital converters (ADCs) followed by digital arithmetic circuitry, this approach is powerhungry, suffers from undersampling, and could occupy a large area footprint. These large circuit blocks may therefore need to be time-multiplexed across multiple neurons, reducing the overall parallelism and diminishing the performance benefits. In this paper, we propose a new function mapping ADC that directly implements non-linear functions as a part of the process of conversion into the digital domain. The design is applicable to both inference and training, since it is capable of implementing both the activation function and its derivative using the same hardware. It is capable of fast and parallel conversion across all neuron values, while also being flexible and reconfigurable. We describe the design, followed by detailed circuit-level simulations demonstrating the viability and flexibility of the approach and quantifying the power and performance numbers. The simulation results show a total conversion time of 207 ns for 512 neurons in parallel, while the total energy consumption is found to be 9.95 nJ, which corresponds to 19.4 pJ per neuron.