计算机科学
卷积神经网络
现场可编程门阵列
可扩展性
核(代数)
计算机工程
卷积(计算机科学)
失败
计算机体系结构
集合(抽象数据类型)
钥匙(锁)
硬件加速
深度学习
并行计算
人工智能
人工神经网络
计算机硬件
组合数学
数据库
计算机安全
数学
程序设计语言
作者
Atul Rahman,Jongeun Lee,Ki‐Young Choi
标识
DOI:10.3850/9783981537079_0833
摘要
Convolutional Deep Neural Networks (DNNs) are reported to show outstanding recognition performance in many image-related machine learning tasks. DNNs have a very high computational requirement, making accelerators a very attractive option. These DNNs have many convolutional layers with different parameters in terms of input/output/kernel sizes as well as input stride. Design constraints usually require a single design for all layers of a given DNN. Thus a key challenge is how to design a common architecture that can perform well for all convolutional layers of a DNN, which can be quite diverse and complex. In this paper we present a flexible yet highly efficient 3D neuron array architecture that is a natural fit for convolutional layers. We also present our technique to optimize its parameters including on-chip buffer sizes for a given set of resource constraint for modern FPGAs. Our experimental results targeting a Virtex-7 FPGA demonstrate that our proposed technique can generate DNN accelerators that can outperform the state-of-the-art solutions, by 22% for 32-bit floating-point MAC implementations, and are far more scalable in terms of compute resources and DNN size.
科研通智能强力驱动
Strongly Powered by AbleSci AI