TB-MFCC multifuse feature for emergency vehicle sound classification using multistacked CNN – Attention BiLSTM

过度拟合 Mel倒谱计算机科学卷积神经网络特征提取模式识别（心理学）特征（语言学）人工智能语音识别均方误差人工神经网络音频信号噪音（视频）数学统计哲学语音编码图像（数学）语言学

作者

T. M. Nithya,P. Dhivya,S. N. Sangeethaa,P. Rajesh Kanna

出处

期刊：Biomedical Signal Processing and Control [Elsevier BV]
日期：2023-11-04 卷期号：88: 105688-105688 被引量：6

标识

DOI：10.1016/j.bspc.2023.105688

摘要

Vehicles equipped for emergencies like ambulances, fire engines, and police cruisers play a vital role in society by responding quickly to emergencies and helping to prevent loss of life and maintain order. Vehicle sound identification and classification are very important in the cities to identify emergency vehicles easily and to clear the traffic effectively. Convolutional Neural Network plays an important role in the accurate prediction of vehicles during an emergency. The main motive of this paper is to develop a suitable model and algorithms for data augmentation, feature extraction, and classification. The proposed TB-MFCC multifuse feature is comprised of data augmentation and feature extraction. First, in the proposed signal augmentation, each audio signal uses noise injection, stretching, shifting, and pitching separately and this process increases the number of instances in the dataset. The proposed augmentation reduces the overfitting problem in the network. Second, Triangular Bluestein Mel Frequency Cepstral Coefficients (TB-MFCC) are proposed and fused with Zero Crossing Rate (ZCR), Mel-frequency cepstral coefficients (MFCC), Root Mean Square (RMS), Chroma, and Tempogram to extract the exact feature which increases the accuracy and reduces the Mean Squared Error (MSE) of the model during classification. Finally, the proposed Multi-stacked Convolutional Neural Network (MCNN) with Attention-based Bidirectional Long Short Term Memory (A-BiLSTM) improves the nonlinear relationship among the features. The proposed Pooled Multifuse Feature Augmentation (PMFA) with MCNN & A-BiLSTM increases the accuracy (98.66 %), reduces the False Positive Rate (FPR) by 1.01 %, and loss (0 %). Thus the model predicts the sound without overfitting, underfitting, and vanishing gradient problems.

求助该文献

最长约 10秒，即可获得该文献文件

TB-MFCC multifuse feature for emergency vehicle sound classification using multistacked CNN – Attention BiLSTM

今日热心研友