口吃
计算机科学
听力学
光谱图
卷积神经网络
语音识别
人口
重复(修辞手法)
人工智能
医学
语言学
环境卫生
哲学
作者
Abedal-Kareem Al-Banna,Eran A. Edirisinghe,Hui Fang
标识
DOI:10.1109/icics55353.2022.9811183
摘要
Stuttering is a neurodevelopmental speech disorder that affects 70 million people worldwide, approximately 1% of the whole population. People who stutter (PWS) have common speech symptoms such as block, interjection, repetition, and prolongation. The speech-language pathologists (SLPs) commonly observe these four groups of symptoms to evaluate stuttering severity. The evaluation process is tedious and time-consuming for (SLP) and (PWS). Therefore, this paper proposes a new model for stuttering events detection that may help (SLP) to evaluate stuttering severity. Our model is based on a log mel spectrogram and 2D atrous convolutional network designed to learn spectral and temporal features. We rigorously evaluate the performance of our model on two stuttering datasets (UCLASS and FluencyBank) using common speech metrics, i.e. F1-score, recall, and the area under the curve (AUC). Our experimental results indicate that our model outperforms state-of-the-art methods in prolongation with an F1 of 52% and 44.5% on the UCLASS and FluencyBank datasets, respectively. Also, we gain 5% and 3% margins on the UCLASS and FluencyBank datasets for fluent class.
科研通智能强力驱动
Strongly Powered by AbleSci AI