现场可编程门阵列
计算机科学
卷积(计算机科学)
矩阵乘法
卷积神经网络
巨量平行
并行计算
门阵列
推论
计算机体系结构
乘法(音乐)
高效能源利用
嵌入式系统
人工智能
人工神经网络
量子力学
量子
电气工程
物理
工程类
声学
作者
Afzal Ahmad,Muhammad Adeel Pasha
出处
期刊:IEEE Transactions on Circuits and Systems Ii-express Briefs
[Institute of Electrical and Electronics Engineers]
日期:2020-01-09
卷期号:67 (11): 2692-2696
被引量:26
标识
DOI:10.1109/tcsii.2020.2965154
摘要
Convolution is inarguably the most complex operation utilized in Convolutional Neural Networks (convnets). Owing to the billions of independent multiply-adds involved, convolution is being massively parallelized by the simultaneous utilization of many cores of Graphical Processing Units (GPUs). Although GPUs have shown significant performance improvements in both training and inference stages, they are not well-suited for mobile vision applications where both energy and real-time constraints need to be satisfied. In contrast, Field Programmable Gate Arrays (FPGAs) have demonstrated massive parallelization capabilities, with fast DSPs and on-chip memory, at a lower energy cost than GPUs. Hence, they are being utilized to design convnet accelerators for embedded applications. In this brief, we design an FPGA-based accelerator for general matrix-matrix multiplication (GeMM) to improve the efficiency of convolutional layers of Shufflenet, an efficient convnet architecture. Experimental results show significant performance improvements against the state-of-the-art FPGA-based implementations of both efficient convnets that are tailored towards mobile vision applications, and complex convnets that are used in traditional applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI