Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA

计算机科学 现场可编程门阵列 卷积神经网络 计算 量化(信号处理) 嵌入式系统 设计流量 数据流图 硬件加速 人工智能 计算机硬件 计算机工程 计算机体系结构 算法 数据库
作者
Kaiyuan Guo,Lingzhi Sui,Jiantao Qiu,Jincheng Yu,Junbin Wang,Song Yao,Song Han,Yu Wang,Huazhong Yang
出处
期刊:IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems [Institute of Electrical and Electronics Engineers]
卷期号:37 (1): 35-47 被引量:483
标识
DOI:10.1109/tcad.2017.2705069
摘要

Convolutional neural network (CNN) has become a successful algorithm in the region of artificial intelligence and a strong candidate for many computer vision algorithms. But the computation complexity of CNN is much higher than traditional algorithms. With the help of GPU acceleration, CNN-based applications are widely deployed in servers. However, for embedded platforms, CNN-based solutions are still too complex to be applied. Various dedicated hardware designs on field-programmable gate arrays (FPGAs) have been carried out to accelerate CNNs, while few of them explore the whole design flow for both fast deployment and high power efficiency. In this paper, we investigate state-of-the-art CNN models and CNN-based applications. Requirements on memory, computation and the flexibility of the system are summarized for mapping CNN on embedded FPGAs. Based on these requirements, we propose Angel-Eye, a programmable and flexible CNN accelerator architecture, together with data quantization strategy and compilation tool. Data quantization strategy helps reduce the bit-width down to 8-bit with negligible accuracy loss. The compilation tool maps a certain CNN model efficiently onto hardware. Evaluated on Zynq XC7Z045 platform, Angel-Eye is 6× faster and 5× better in power efficiency than peer FPGA implementation on the same platform. Applications of VGG network, pedestrian detection and face alignment are used to evaluate our design on Zynq XC7Z020. NIVIDA TK1 and TX1 platforms are used for comparison. Angel-Eye achieves similar performance and delivers up to 16× better energy efficiency.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
phase发布了新的文献求助10
刚刚
星辰大海应助HHD采纳,获得10
1秒前
烟花应助zlenetr采纳,获得10
1秒前
华仔应助琪小7采纳,获得10
1秒前
2秒前
2秒前
2秒前
科研通AI5应助袁大头采纳,获得10
2秒前
Owen应助zbumian采纳,获得10
3秒前
慕青应助brick2024采纳,获得10
4秒前
科研通AI5应助生动的煎蛋采纳,获得10
4秒前
wanci应助一谩采纳,获得10
5秒前
我的Diy发布了新的文献求助10
5秒前
6秒前
内向问旋发布了新的文献求助10
7秒前
7秒前
8秒前
8秒前
8秒前
Tea完成签到,获得积分10
9秒前
9秒前
完美世界应助淡然向松采纳,获得10
9秒前
bkagyin应助100采纳,获得10
10秒前
zlenetr完成签到,获得积分10
10秒前
10秒前
11秒前
搜集达人应助ymm采纳,获得10
12秒前
琪小7完成签到,获得积分10
12秒前
汉堡发布了新的文献求助10
12秒前
体贴念梦完成签到,获得积分10
13秒前
轨迹永远发布了新的文献求助10
13秒前
cccccc发布了新的文献求助10
13秒前
刘兄发布了新的文献求助10
13秒前
14秒前
14秒前
华仔应助袁大头采纳,获得10
14秒前
打打应助舒适的平蓝采纳,获得10
15秒前
wu完成签到,获得积分10
15秒前
15秒前
高分求助中
Continuum Thermodynamics and Material Modelling 4000
Production Logging: Theoretical and Interpretive Elements 2700
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
El viaje de una vida: Memorias de María Lecea 800
Novel synthetic routes for multiple bond formation between Si, Ge, and Sn and the d- and p-block elements 700
Neuromuscular and Electrodiagnostic Medicine Board Review 700
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3515049
求助须知:如何正确求助?哪些是违规求助? 3097391
关于积分的说明 9235300
捐赠科研通 2792358
什么是DOI,文献DOI怎么找? 1532422
邀请新用户注册赠送积分活动 712063
科研通“疑难数据库(出版商)”最低求助积分说明 707107