计算机科学
现场可编程门阵列
管道(软件)
压缩比
计算机工程
支持向量机
算法
并行计算
计算机硬件
人工智能
汽车工程
工程类
程序设计语言
内燃机
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:11: 122357-122367
被引量:2
标识
DOI:10.1109/access.2023.3329048
摘要
Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition and machine translation. The low-latency and energy-efficiency requirements of the real-world applications make model compression and hardware acceleration for LSTM an urgent need. In this paper, we first propose a weight parameter generation method based on vector construction that can make the model have a higher compression ratio and produce less precision attenuation. Furthermore, we study in detail the influence of the size of the construction vector on the computational complexity, model compression ratio and accuracy of the construction vector, in order to obtain the optimal size design interval. Moreover, we designed a linear transformation method and a convolution method to reduce the dimension of the input sequence, so that it can be applied to training sets of different dimensions without changing the size of the model construction vector. Finally, we use high-level synthesis (HLS) to deploy the obtained LSTM inference model to the FPGA device, and use the parallel pipeline operation to realize the reuse of resources. Experiments show that, compared with the block circulant matrix method, the proposed designs generated by our framework achieve up to 2 times gains for compression with same accuracy degradation, and it has an acceptable delay. With the same compression ratio, our accuracy decay is 45% of the former.
科研通智能强力驱动
Strongly Powered by AbleSci AI