人工神经网络
计算机科学
浮点型
计算机工程
深层神经网络
推论
指数
点(几何)
人工智能
计算机硬件
算法
机器学习
并行计算
数学
语言学
哲学
几何学
作者
Jeongwoo Park,Sunwoo Lee,Dongsuk Jeon
出处
期刊:IEEE Journal of Solid-state Circuits
[Institute of Electrical and Electronics Engineers]
日期:2021-08-17
卷期号:57 (3): 965-977
被引量:2
标识
DOI:10.1109/jssc.2021.3103603
摘要
Recent advances in deep neural networks (DNNs) and machine learning algorithms have induced the demand for services based on machine learning algorithms that require a large number of computations, and specialized hardware ranging from accelerators for data centers to on-device computing systems have been introduced. Low-precision math such as 8-bit integers have been used in neural networks for energy-efficient neural network inference, but training with low-precision numbers without performance degradation have remained to be a challenge. To overcome this challenge, this article presents an 8-bit floating-point neural network training processor for state-of-the-art non-sparse neural networks. As naïve 8-bit floating-point numbers are insufficient for training DNNs robustly, two additional methods are introduced to ensure high-performance DNN training. First, a novel numeric system which we dub as 8-bit floating point with shared exponent bias (FP8-SEB) is introduced. Moreover, multiple-way fused multiply-add (FMA) trees are used in FP8-SEB’s hardware implementation to ensure higher numerical precision and reduced energy. FP8-SEB format combined with multiple-way FMA trees is evaluated under various scenarios to show a trained-from-scratch performance that is close to or even surpasses that of current networks trained with full-precision (FP32). Our silicon-verified DNN training processor utilizes 24-way FMA trees implemented with FP8-SEB math and flexible 2-D routing schemes to show $2.48\times $ higher energy efficiency than prior low-power neural network training processors and $78.1\times $ lower energy than standard GPUs.
科研通智能强力驱动
Strongly Powered by AbleSci AI