计算机科学
相关性
语音识别
人工智能
自然语言处理
模式识别(心理学)
数学
几何学
作者
Chen Chen,Bohan Dai,B. Mathura Bai,Deyun Chen
标识
DOI:10.1016/j.asoc.2024.111413
摘要
Synthetic speech is becoming increasingly rampant, and automatic speaker verification (ASV) systems are vulnerable to its attacks. However, most current synthetic speech detection methods focus on the influence of a single feature in the detection. Since different features can represent the difference between real speech and synthetic speech to a certain extent, there must be common information between different types of features. Effectively finding and fully utilizing this information will facilitate the extraction of better discriminative features and achieve improved performance. Based on the above analysis, we propose a deep correlation network (DCN) to learn the latent common information between different embeddings. It consists of two parts, the bi-parallel network and the correlation learning network. Bi-parallel networks consist of different neural models to learn the middle-level representations from front-end acoustical features. The correlation learning network is the core part of the DCN and is proposed to explore the common information between the above middle-level features. The common information obtained after DCN processing have better discriminative ability for synthetic speech detection. Experimental results show that the proposed DCN can significantly improve the performance of synthetic speech detection system on ASVspoof 2019 and ASVspoof 2021 logical access sub-challenge.
科研通智能强力驱动
Strongly Powered by AbleSci AI