现场可编程门阵列
卷积神经网络
计算机科学
吞吐量
构造(python库)
计算机体系结构
细胞神经网络
深度学习
计算机工程
资源(消歧)
Virtex公司
人工智能
嵌入式系统
人工神经网络
分布式计算
操作系统
计算机网络
无线
作者
Yongming Shen,Michael Ferdman,Peter Milder
出处
期刊:Field-Programmable Logic and Applications
日期:2016-08-01
被引量:47
标识
DOI:10.1109/fpl.2016.7577315
摘要
Convolutional neural networks (CNNs) are revolutionizing a variety of machine learning tasks, but they present significant computational challenges. Recently, FPGA-based accelerators have been proposed to improve the speed and efficiency of CNNs. Current approaches construct an accelerator optimized to maximize the overall throughput of iteratively computing the CNN layers. However, this approach leads to dynamic resource underutilization because the same accelerator is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator design that improves the dynamic resource utilization. Using the same FPGA resources, we build multiple accelerators, each specialized for specific CNN layers. Our design achieves 1.3× higher throughput than the state of the art when evaluating the convolutional layers of the popular AlexNet CNN on a Xilinx Virtex-7 FPGA.
科研通智能强力驱动
Strongly Powered by AbleSci AI