计算机科学
加速
范围(计算机科学)
硬件加速
高效能源利用
吞吐量
计算机体系结构
航程(航空)
计算机硬件
并行计算
嵌入式系统
现场可编程门阵列
操作系统
程序设计语言
工程类
电气工程
复合材料
材料科学
无线
作者
Yunji Chen,Yunji Chen,Zhiwei Xu,Ninghui Sun,Olivier Temam
出处
期刊:Communications of The ACM
[Association for Computing Machinery]
日期:2016-10-28
卷期号:59 (11): 105-112
被引量:121
摘要
Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers). As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously achieve high efficiency and broad application scope. While efficient computational primitives are important for a hardware accelerator, inefficient memory transfers can potentially void the throughput, energy, or cost advantages of accelerators, that is, an Amdahl's law effect, and thus, they should become a first-order concern, just like in processors, rather than an element factored in accelerator design on a second step. In this article, we introduce a series of hardware accelerators (i.e., the DianNao family) designed for ML (especially neural networks), with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that, on a number of representative neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip DaDianNao system (a member of the DianNao family).<!-- END_PAGE_1 -->
科研通智能强力驱动
Strongly Powered by AbleSci AI