计算机科学
加速
范围(计算机科学)
硬件加速
高效能源利用
吞吐量
计算机体系结构
航程(航空)
计算机硬件
并行计算
嵌入式系统
现场可编程门阵列
操作系统
程序设计语言
工程类
电气工程
复合材料
材料科学
无线
作者
Yunji Chen,Yunji Chen,Zhiwei Xu,Ninghui Sun,Olivier Temam
摘要
Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers). As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously achieve high efficiency and broad application scope. While efficient computational primitives are important for a hardware accelerator, inefficient memory transfers can potentially void the throughput, energy, or cost advantages of accelerators, that is, an Amdahl's law effect, and thus, they should become a first-order concern, just like in processors, rather than an element factored in accelerator design on a second step. In this article, we introduce a series of hardware accelerators (i.e., the DianNao family) designed for ML (especially neural networks), with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that, on a number of representative neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip DaDianNao system (a member of the DianNao family).<!-- END_PAGE_1 -->
科研通智能强力驱动
Strongly Powered by AbleSci AI