Self-Attention-Based Convolutional GRU for Enhancement of Adversarial Speech Examples

计算机科学对抗制语音识别稳健性（进化）人工智能语音增强字错误率公制（单位）瓶颈特征（语言学）一般化深度学习模式识别（心理学）降噪数学工程类基因数学分析哲学嵌入式系统生物化学语言学化学运营管理

作者

Chaitanya Jannu,Sunny Dayal Vanambathina

出处

期刊：International Journal of Image and Graphics [World Scientific]
日期：2023-07-08 卷期号：24 (06) 被引量：1

标识

DOI：10.1142/s0219467824500530

摘要

Recent research has identified adversarial examples which are the challenges to DNN-based ASR systems. In this paper, we propose a new model based on Convolutional GRU and Self-attention U-Net called [Formula: see text] to improve adversarial speech signals. To represent the correlation between neighboring noisy speech frames, a two-Layer GRU is added in the bottleneck of U-Net and an attention gate is inserted in up-sampling units to increase the adversarial stability. The goal of using GRU is to combine the weights sharing technique with the use of gates to control the flow of data across multiple feature maps. As a result, it outperforms the original 1D convolution used in [Formula: see text]. Especially, the performance of the model is evaluated by explainable speech recognition metrics and its performance is analyzed by the improved adversarial training. We used adversarial audio attacks to perform experiments on automatic speech recognition (ASR). We saw (i) the robustness of ASR models which are based on DNN can be improved using the temporal features grasped by the attention-based GRU network; (ii) through adversarial training, including some additive adversarial data augmentation, we could improve the generalization power of Automatic Speech Recognition models which are based on DNN. The word-error-rate (WER) metric confirmed that the enhancement capabilities are better than the state-of-the-art [Formula: see text]. The reason for this enhancement is the ability of GRU units to extract global information within the feature maps. Based on the conducted experiments, the proposed [Formula: see text] increases the score of Speech Transmission Index (STI), Perceptual Evaluation of Speech Quality (PESQ), and the Short-term Objective Intelligibility (STOI) with adversarial speech examples in speech enhancement.

求助该文献

最长约 10秒，即可获得该文献文件

Self-Attention-Based Convolutional GRU for Enhancement of Adversarial Speech Examples

今日热心研友