计算机科学
语音识别
哈夫曼编码
人工智能
说话人识别
炸薯条
计算机硬件
模式识别(心理学)
数据压缩
电信
作者
Xuanhao Zhang,Hui Kou,Chenjie Xia,Hao Cai,Bo Liu
标识
DOI:10.1109/icassp48485.2024.10447203
摘要
Traditional automatic speech recognition (ASR) models face challenges when deployed on edge devices due to their high computational requirements and storage demands. To address this issue, we present a novel ASR system specifically designed for edge applications, encompassing both keyword spotting (KWS) and speaker verification (SV) functionalities with on chip learning for speaker registration. Our proposed system employs a compact model trained using a two-stage transfer learning method for on-the-fly small-sample speaker registration. In the proposed model, sparsity-controllable weights are symmetrically ternary-quantized to further exploit data reuse. Additionally, we introduce a Huffman-coding based weight lossy compression method to achieve efficient storage compaction. Moreover, we propose a specialized classifier taking the signal-to-noise ratio into account to enhance the accuracy of SV. The proposed ASR system has been successfully deployed on a 1.65mm 2 custom chip fabricated under 28-nm technology, with only 10.84KB of on-chip memory. This compact system effectively handles KWS and SV tasks, as well as on-chip speaker registration.
科研通智能强力驱动
Strongly Powered by AbleSci AI