计算机科学
比例(比率)
声音(地理)
人工智能
语音识别
声学
地图学
地理
物理
作者
Bing Han,Zhiqiang Lv,Anbai Jiang,Wen Huang,Zhengyang Chen,Yufeng Deng,Jingzhong Ding,Cheng Liu,Weiqiang Zhang,Pingyi Fan,Jia Liu,Qian Ye
标识
DOI:10.1109/icassp48485.2024.10447183
摘要
Machine anomalous sound detection is a useful technique for various applications, but it often suffers from poor generalization due to the challenges of data collection and complex acoustic environment. To address this issue, we propose a robust machine anomalous sound detection model that leverages self-supervised pre-trained models on large-scale speech data. Specifically, we assign different weights to the features from different layers of the pre-trained model and then use the working condition as the label for self-supervised classification fine-tuning. Moreover, we introduce a data augmentation method that simulates different operating states of the machine to enrich the dataset. Furthermore, we devise a transformer pooling method that fuses the features of different segments. Experiments on the DCASE2023 dataset show that our proposed method outperforms the commonly used reconstruction-based autoencoder and classification-based convolutional network by a large margin, demonstrating the effectiveness of large-scale pre-training for enhancing the generalization and robustness of machine anomalous sound detection. In Task2 of DCASE2023, we achieve 2nd place with these methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI