计算机科学
残余物
人工神经网络
计算机工程
计算
卷积神经网络
调度(生产过程)
控制重构
并行计算
算法
人工智能
嵌入式系统
运营管理
经济
作者
Jiale Yan,Shouyi Yin,Fengbin Tu,Leibo Liu,Shaojun Wei
标识
DOI:10.1109/tcad.2018.2857258
摘要
Generative networks have become ubiquitous in image generation applications like image super-resolution, image to image translation, and text to image synthesis. They are usually composed of convolutional (CONV) layers, convolution-based residual blocks, and deconvolutional (DeCONV) layers. Previous works on neural network acceleration focus too much on optimizing CONV layers computation such as data-reuse or parallel computation, but have low processing element (PE) utilization in computing residual blocks and DeCONV layers: residual blocks require very high memory bandwidth when performing elementwise additions on residual paths; DeCONV layers have imbalanced operation counts for different outputs. In this paper, we propose a dual convolution mapping method for CONV and DeCONV layers to make full use of the available PE resources. A cross-layer scheduling method is also proposed to avoid extra off-chip memory access in residual block processing. Precision-adaptive PEs and buffer bandwidth reconfiguration are used to support flexible bitwidths for both inputs and weights in deep neural networks. We implement a generative network accelerator (GNA) based on intra-PE processing, inter-PE processing, and cross-layer scheduling techniques. Owing to the proposed optimization techniques, GNA achieves energy efficiency of 2.05 TOPS/W with 61% higher PE utilization than traditional methods in generative network acceleration.
科研通智能强力驱动
Strongly Powered by AbleSci AI