量化(信号处理)
推论
卷积神经网络
计算机科学
深度学习
加速
人工智能
人工神经网络
计算机工程
算法
并行计算
作者
Yun Wang,Qiang Liu,Shun Yan
标识
DOI:10.1109/fccm53951.2022.9786195
摘要
The post-training compression with quantization is a common technology to improve the efficiency of embedded neural network accelerators. In this paper, a Dynamic Quantization in Inference (DQI) method is proposed to solve the severe quantization overflow problem that may occur in CNN inference process. Based on analysis of quantization errors of activation values in convolutional layers, efficient quantization overflow detection and quantization parameters dynamic update are designed and implemented in CNN accelerator. The evaluation result on VGG16 and MobileNetV2 models demonstrates that DQI can improve the inference accuracy of by up to 11.59% in high overflow scenarios, while the overhead in hardware resources and runtime is acceptable.
科研通智能强力驱动
Strongly Powered by AbleSci AI