计算机科学
依赖关系(UML)
新闻聚合器
语音识别
欺骗攻击
语音合成
卷积(计算机科学)
人工智能
模式识别(心理学)
人工神经网络
计算机网络
操作系统
作者
Xiaohui Liu,Meng Liu,Longbiao Wang,Kong Aik Lee,Hanyi Zhang,Jianwu Dang
标识
DOI:10.1109/icassp49357.2023.10096278
摘要
Automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. As synthetic speech exhibits local and global artifacts compared to natural speech, incorporating local-global dependency would lead to better anti-spoofing performance. To this end, we propose the Rawformer that leverages positional-related local-global dependency for synthetic speech detection. The two-dimensional convolution and Transformer are used in our method to capture local and global dependency, respectively. Specifically, we design a novel positional aggregator that integrates local-global dependency by adding positional information and flattening strategy with less information loss. Furthermore, we propose the squeeze-and-excitation Rawformer (SE-Rawformer), which introduces squeeze-and-excitation operation to acquire local dependency better. The results demonstrate that our proposed SE-Rawformer leads to 37% relative improvement compared to the single state-of-the-art system on ASVspoof 2019 LA and generalizes well on ASVspoof 2021 LA. Especially, using the positional aggregator in the SE-Rawformer brings a 43% improvement on average.
科研通智能强力驱动
Strongly Powered by AbleSci AI