计算机科学
现场可编程门阵列
卷积神经网络
计算
量化(信号处理)
嵌入式系统
设计流量
数据流图
硬件加速
人工智能
计算机硬件
计算机工程
计算机体系结构
算法
数据库
作者
Kaiyuan Guo,Lingzhi Sui,Jiantao Qiu,Jincheng Yu,Junbin Wang,Song Yao,Song Han,Yu Wang,Huazhong Yang
标识
DOI:10.1109/tcad.2017.2705069
摘要
Convolutional neural network (CNN) has become a successful algorithm in the region of artificial intelligence and a strong candidate for many computer vision algorithms. But the computation complexity of CNN is much higher than traditional algorithms. With the help of GPU acceleration, CNN-based applications are widely deployed in servers. However, for embedded platforms, CNN-based solutions are still too complex to be applied. Various dedicated hardware designs on field-programmable gate arrays (FPGAs) have been carried out to accelerate CNNs, while few of them explore the whole design flow for both fast deployment and high power efficiency. In this paper, we investigate state-of-the-art CNN models and CNN-based applications. Requirements on memory, computation and the flexibility of the system are summarized for mapping CNN on embedded FPGAs. Based on these requirements, we propose Angel-Eye, a programmable and flexible CNN accelerator architecture, together with data quantization strategy and compilation tool. Data quantization strategy helps reduce the bit-width down to 8-bit with negligible accuracy loss. The compilation tool maps a certain CNN model efficiently onto hardware. Evaluated on Zynq XC7Z045 platform, Angel-Eye is 6× faster and 5× better in power efficiency than peer FPGA implementation on the same platform. Applications of VGG network, pedestrian detection and face alignment are used to evaluate our design on Zynq XC7Z020. NIVIDA TK1 and TX1 platforms are used for comparison. Angel-Eye achieves similar performance and delivers up to 16× better energy efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI