计算机科学
对抗制
语音识别
稳健性(进化)
人工智能
语音增强
字错误率
公制(单位)
瓶颈
特征(语言学)
一般化
深度学习
模式识别(心理学)
降噪
数学
工程类
数学分析
生物化学
化学
运营管理
语言学
哲学
基因
嵌入式系统
作者
Chaitanya Jannu,Sunny Dayal Vanambathina
标识
DOI:10.1142/s0219467824500530
摘要
Recent research has identified adversarial examples which are the challenges to DNN-based ASR systems. In this paper, we propose a new model based on Convolutional GRU and Self-attention U-Net called [Formula: see text] to improve adversarial speech signals. To represent the correlation between neighboring noisy speech frames, a two-Layer GRU is added in the bottleneck of U-Net and an attention gate is inserted in up-sampling units to increase the adversarial stability. The goal of using GRU is to combine the weights sharing technique with the use of gates to control the flow of data across multiple feature maps. As a result, it outperforms the original 1D convolution used in [Formula: see text]. Especially, the performance of the model is evaluated by explainable speech recognition metrics and its performance is analyzed by the improved adversarial training. We used adversarial audio attacks to perform experiments on automatic speech recognition (ASR). We saw (i) the robustness of ASR models which are based on DNN can be improved using the temporal features grasped by the attention-based GRU network; (ii) through adversarial training, including some additive adversarial data augmentation, we could improve the generalization power of Automatic Speech Recognition models which are based on DNN. The word-error-rate (WER) metric confirmed that the enhancement capabilities are better than the state-of-the-art [Formula: see text]. The reason for this enhancement is the ability of GRU units to extract global information within the feature maps. Based on the conducted experiments, the proposed [Formula: see text] increases the score of Speech Transmission Index (STI), Perceptual Evaluation of Speech Quality (PESQ), and the Short-term Objective Intelligibility (STOI) with adversarial speech examples in speech enhancement.
科研通智能强力驱动
Strongly Powered by AbleSci AI