量化(信号处理)
计算机科学
正规化(语言学)
人工神经网络
算法
人工智能
作者
Vladimir Chikin,Kirill Solodskikh,Irina Zhelavskaya
标识
DOI:10.1007/978-3-031-19775-8_1
摘要
While Deep Neural Networks (DNNs) quantization leads to a significant reduction in computational and storage costs, it reduces model capacity and therefore, usually leads to an accuracy drop. One of the possible ways to overcome this issue is to use different quantization bit-widths for different layers. The main challenge of the mixed-precision approach is to define the bit-widths for each layer, while staying under memory and latency requirements. Motivated by this challenge, we introduce a novel technique for explicit complexity control of DNNs quantized to mixed-precision, which uses smooth optimization on the surface containing neural networks of constant size. Furthermore, we introduce a family of smooth quantization regularizers, which can be used jointly with our complexity control method for both post-training mixed-precision quantization and quantization-aware training. Our approach can be applied to any neural network architecture. Experiments show that the proposed techniques reach state-of-the-art results.
科研通智能强力驱动
Strongly Powered by AbleSci AI