卷积神经网络
计算机科学
模式识别(心理学)
人工智能
语音识别
稳健性(进化)
生物化学
化学
基因
作者
Deli Fu,Xue‐Hui Zhang,Dandan Chen,Weiping Hu
标识
DOI:10.1016/j.jvoice.2022.08.028
摘要
The nonlinear dynamic features can effectively describe the acoustic characteristics of normal and pathological voice. In this paper, the phase space reconstruction and convolution neural network are used to classify the normal and pathological voice. The phase space information of normal and pathological voice is reconstructed using delay time and embedding dimension, the one-dimensional signal is converted to a two-dimensional matrix, and the reconstructed trajectory graph sample of the signal is generated. The trajectory graph samples are used as the input of the VGG-like convolutional neural network, and the graphical features are extracted to achieve a classification of normal and pathological voice. In order to overcome the lack of clinical data, a data enhancement scheme is used. The experiment which classifies the normal and pathological voice is carried out on three pathological databases respectively, i.e. the Massachusetts eye and ear infirmary (MEEI) database, Saarbrücken voice database (SVD) database, and a clinical database collected by the authors. Five-fold cross validation is used and the average recognition rates on the three databases are 99.42%, 97.30% and 95.88% respectively. The average recognition rates are 96.04% and 92.27% for normal, vocal fold paralysis and vocal fold non-paralysis voice in MEEI database and SVD database. The experimental results show that the method has high classification recognition rate and good robustness, and has certain universal applicability for the recognition of the normal and pathological voice.
科研通智能强力驱动
Strongly Powered by AbleSci AI