计算机科学
交错
推论
可扩展性
利用
计算机工程
并行计算
浮点型
高效能源利用
硬件加速
深度学习
人工智能
算法
加速
计算机体系结构
现场可编程门阵列
计算机硬件
数据库
操作系统
计算机安全
电气工程
工程类
作者
Hongqian Lu,Liang Chang,Chenglong Li,Zixuan Zhu,Shengjian Lu,Yanhuan Liu,Mingzhe Zhang
出处
期刊:International Symposium on Microarchitecture
日期:2021-10-17
被引量:13
标识
DOI:10.1145/3466752.3480123
摘要
Along with the rapid evolution of deep neural networks, the ever-increasing complexity imposes formidable computation intensity to the hardware accelerator. In this paper, we propose a novel computing philosophy called “bit interleaving” and the associate accelerator design called “Bitlet” to maximally exploit the bit-level sparsity. Apart from existing bit-serial/parallel accelerators, Bitlet leverages the abundant “sparsity parallelism” in the parameters to enforce the inference acceleration. Bitlet is versatile by supporting diverse precisions on a single platform, including floating-point 32 and fixed-point from 1b to 24b. The versatility enables Bitlet feasible for both efficient inference and training. Empirical studies on 12 domain-specific deep learning applications highlight the following results: (1) up to 81 × /21 × energy efficiency improvement for training/inference over recent high performance GPUs; (2) up to 15 × /8 × higher speedup/efficiency over state-of-the-art fixed-point accelerators; (3) 1.5mm2 area and scalable power consumption from 570mW (float32) to 432mW (16b) and 365mW (8b) @28nm TSMC; (4) highly configurable justified by ablation and sensitivity studies.
科研通智能强力驱动
Strongly Powered by AbleSci AI