Mel倒谱
语音识别
响度
鉴别器
计算机科学
人工智能
特征提取
计算机视觉
电信
探测器
作者
Shivani Yadav,Merugu Keerthana,Dipanjan Gope,Uma Maheswari K.,Prasanta Ghosh
标识
DOI:10.1109/icassp40776.2020.9054062
摘要
Non-speech sounds (cough, wheeze) are typically known to perform better than speech sounds for asthmatic and healthy subject classification. In this work, we use sustained phonations of speech sounds, namely, /α:/, /i:/, /u:/, /eI/, /ou/, /s/, and /z/ from 47 asthmatic and 48 healthy controls. We consider INTERSPEECH 2013 Computational Paralinguistics Challenge baseline (ISCB) acoustic features for the classification task as they provide a rich set of characteristics of the speech sounds. Mel-frequency cepstral coefficients (MFCC) are used as the baseline features. The classification accuracy using ISCB improves over MFCC for all voiced speech sounds with the highest classification accuracy of 75.4% (18.28% better than baseline) for /ou/. The exhale achieves the highest classification accuracy of 77.8% (4.2% better than baseline). Comparable accuracies using speech sound /ou/ and non-speech exhale indicate the benefit of the rich acoustic features from ISCB. An analysis of 21 ISCB features groups using forward feature group selection shows that loudness and MFCC groups contribute the most in the case of /ou/, with interquartile range between 2 nd and 3 rd quartile of loudness feature being the best discriminator feature.
科研通智能强力驱动
Strongly Powered by AbleSci AI