计算机科学
欺骗攻击
语音识别
规范化(社会学)
特征提取
判别式
模式识别(心理学)
字错误率
人工智能
数据挖掘
人类学
计算机网络
社会学
作者
Matan Karo,Arie Yeredor,Itshak Lapidot
标识
DOI:10.1109/taslp.2023.3341000
摘要
Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filterbank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data lies. The suggested embeddings allow the use of simple linear separators to achieve 12.99% Equal Error Rate (EER) on ASVspoof2019 logical Access (LA) test set for female samples, and 12.09% for male samples. In addition, we present a convenient way to visualize the data, which helps to assess the efficiency of different spoofing techniques. Furthermore, we present reduced complexity embedding method by using compander quantization, which in some cases even improves the EER on the test set up to 3.00%. The experimental results show the potential of using multichannel PMF-based features for the anti-spoofing task, in addition to the benefits of using diffusion maps both as an analysis tool and as an embedding tool.
科研通智能强力驱动
Strongly Powered by AbleSci AI