Mel倒谱
语音识别
计算机科学
主成分分析
噪音(视频)
倒谱
模式识别(心理学)
说话人识别
人工智能
特征提取
图像(数学)
作者
Huapeng Wang,Cuiling Zhang
标识
DOI:10.1080/00450618.2019.1584830
摘要
Compared with humans, who have more powerful auditory ability in discriminating and identifying speakers in noisy environments, traditional forensic automatic speaker recognizers do not perform well when dealing with noisy recordings. This paper proposes a GMM-UBM Forensic Automatic Speaker Recognition (FASR) System to reduce the effect of noise on performance. The system uses Gammatone Frequency Cepstral Coefficients (GFCC) based on an auditory periphery model and also incorporates a Principal Component Analysis (PCA) algorithm. The system was tested and validated using Mandarin voice databases compromised with different levels of white noise and office noise. The performance of the system was compared with a baseline system using Mel Frequency Cepstral Coefficients (MFCC) and also PCA under the same conditions. The results show that the performance of the combined GFCC system achieved a substantial improvement when compared with the baseline MFCC system under conditions of a high level of office noise.
科研通智能强力驱动
Strongly Powered by AbleSci AI