Self-Attention-Based Convolutional GRU for Enhancement of Adversarial Speech Examples

计算机科学 对抗制 语音识别 稳健性(进化) 人工智能 语音增强 字错误率 公制(单位) 瓶颈 特征(语言学) 一般化 深度学习 模式识别(心理学) 降噪 数学 工程类 基因 数学分析 哲学 嵌入式系统 生物化学 语言学 化学 运营管理
作者
Chaitanya Jannu,Sunny Dayal Vanambathina
出处
期刊:International Journal of Image and Graphics [World Scientific]
卷期号:24 (06) 被引量:1
标识
DOI:10.1142/s0219467824500530
摘要

Recent research has identified adversarial examples which are the challenges to DNN-based ASR systems. In this paper, we propose a new model based on Convolutional GRU and Self-attention U-Net called [Formula: see text] to improve adversarial speech signals. To represent the correlation between neighboring noisy speech frames, a two-Layer GRU is added in the bottleneck of U-Net and an attention gate is inserted in up-sampling units to increase the adversarial stability. The goal of using GRU is to combine the weights sharing technique with the use of gates to control the flow of data across multiple feature maps. As a result, it outperforms the original 1D convolution used in [Formula: see text]. Especially, the performance of the model is evaluated by explainable speech recognition metrics and its performance is analyzed by the improved adversarial training. We used adversarial audio attacks to perform experiments on automatic speech recognition (ASR). We saw (i) the robustness of ASR models which are based on DNN can be improved using the temporal features grasped by the attention-based GRU network; (ii) through adversarial training, including some additive adversarial data augmentation, we could improve the generalization power of Automatic Speech Recognition models which are based on DNN. The word-error-rate (WER) metric confirmed that the enhancement capabilities are better than the state-of-the-art [Formula: see text]. The reason for this enhancement is the ability of GRU units to extract global information within the feature maps. Based on the conducted experiments, the proposed [Formula: see text] increases the score of Speech Transmission Index (STI), Perceptual Evaluation of Speech Quality (PESQ), and the Short-term Objective Intelligibility (STOI) with adversarial speech examples in speech enhancement.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
红桃EDC完成签到,获得积分10
刚刚
今后应助从容紫寒采纳,获得10
刚刚
2秒前
2秒前
普鲁卡因发布了新的文献求助10
2秒前
mountainbike完成签到,获得积分10
3秒前
小秦完成签到,获得积分10
3秒前
cen发布了新的文献求助10
3秒前
ding应助莫里采纳,获得10
3秒前
小二郎应助柔弱凡之采纳,获得10
4秒前
克克完成签到,获得积分10
4秒前
斯文败类应助yy采纳,获得10
4秒前
科研通AI6.1应助彩色鸿涛采纳,获得100
4秒前
得意黑应助文件撤销了驳回
4秒前
科研通AI6.1应助酥酥采纳,获得10
4秒前
4秒前
小赖想睡觉完成签到,获得积分10
4秒前
5秒前
CodeCraft应助科研通管家采纳,获得10
6秒前
今后应助科研通管家采纳,获得10
6秒前
6秒前
传奇3应助科研通管家采纳,获得10
6秒前
lyj完成签到,获得积分10
6秒前
852应助科研通管家采纳,获得30
6秒前
上官若男应助科研通管家采纳,获得10
6秒前
6秒前
6秒前
6秒前
英姑应助科研通管家采纳,获得10
6秒前
7秒前
7秒前
小蘑菇应助科研通管家采纳,获得10
7秒前
田様应助科研通管家采纳,获得10
7秒前
桐桐应助科研通管家采纳,获得10
7秒前
认真的不评完成签到,获得积分10
8秒前
TeeteePor发布了新的文献求助10
8秒前
8秒前
8秒前
8秒前
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Cowries - A Guide to the Gastropod Family Cypraeidae 1200
Quality by Design - An Indispensable Approach to Accelerate Biopharmaceutical Product Development 800
Pulse width control of a 3-phase inverter with non sinusoidal phase voltages 777
Signals, Systems, and Signal Processing 610
A Social and Cultural History of the Hellenistic World 500
Chemistry and Physics of Carbon Volume 15 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6398106
求助须知:如何正确求助?哪些是违规求助? 8213456
关于积分的说明 17403709
捐赠科研通 5451343
什么是DOI,文献DOI怎么找? 2881342
邀请新用户注册赠送积分活动 1857876
关于科研通互助平台的介绍 1699863