计算机科学
现场可编程门阵列
移动设备
协处理器
对称多处理机系统
人工神经网络
计算
过程(计算)
嵌入式系统
深度学习
绘图
图形处理单元
图形处理单元的通用计算
计算机体系结构
并行计算
人工智能
计算机图形学(图像)
操作系统
算法
作者
Yuexuan Tu,Saad Sadiq,Yudong Tao,Mei‐Ling Shyu,Shu‐Ching Chen
标识
DOI:10.1109/iri.2019.00040
摘要
Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.
科研通智能强力驱动
Strongly Powered by AbleSci AI