Mel倒谱
人工神经网络
话筒
计算机科学
模式识别(心理学)
人工智能
多层感知器
语音识别
特征提取
声压
电信
作者
Rafael Zinni Lopes,Gustavo C. Dacanal
摘要
Abstract Crispness is a textural characteristic that influences consumer choices, requiring a comprehensive understanding for product customization. Previous studies employing neural networks focused on acquiring audio through mechanical crushing of crispy samples. This research investigates the representation of crispy sound in time intervals and frequency domains, identifying key parameters to distinguish different foods. Two machine learning architectures, multi‐layer perceptron (MLP) and residual neural network (ResNet), were used to analyze mel frequency cepstral coefficients (MFCC) and discrete Fourier transform (DFT) data, respectively. The models achieved over 95% accuracy “in‐sample” successfully classifying fried chicken, potato chips, and toast using randomly extracted audio from ASMR videos. The MLP (MFCC) model demonstrated superior robustness compared to ResNet and predicted external inputs, such as freshly toasted bread acquired by a microphone or ASMR audio of toast in milk. In contrast, the ResNet model proved to be more responsive to variations in DFT spectrum and unable to predict the similarity of external audio sources, making it useful for classifying pretrained “in‐samples”. These findings are useful for classifying crispness among individual food sources. Additionally, the study explores the promising utilization of ASMR audio from Internet platforms to pretrain artificial neural network models, expanding the dataset for investigating the texture of crispy foods.
科研通智能强力驱动
Strongly Powered by AbleSci AI