计算机科学
估计员
人工神经网络
语音识别
背景(考古学)
Mel倒谱
人工智能
代表(政治)
语音活动检测
均方误差
模式识别(心理学)
循环神经网络
机器学习
语音处理
特征提取
统计
数学
政治学
法学
古生物学
政治
生物
作者
Dushyant Sharma,Aidan O. T. Hogg,Yu Wang,Amr H. Nour-Eldin,Patrick A. Naylor
标识
DOI:10.23919/eusipco.2019.8902646
摘要
Estimating the quality of speech without the use of a clean reference signal is a challenging problem, in part due to the time and expense required to collect sufficient training data for modern machine learning algorithms. We present a novel, non-intrusive estimator that exploits recurrent neural network architectures to predict the intrusive POLQA score of a speech signal in a short time context. The predictor is based on a novel compressed representation of modulation domain features, used in conjunction with static MFCC features. We show that the proposed method can reliably predict POLQA with a 300 ms context, achieving a mean absolute error of 0.21 on unseen data. The proposed method is trained using English speech and is shown to generalize well across unseen languages. The neural network also jointly estimates the mean voice activity detection (VAD) with an F1 accuracy score of 0.9, removing the need for an external VAD.
科研通智能强力驱动
Strongly Powered by AbleSci AI