计算机科学
分割
语音识别
事件(粒子物理)
人工智能
推论
音频信号处理
模式识别(心理学)
搜索引擎索引
平滑的
音频信号
计算机视觉
语音编码
物理
量子力学
作者
Satvik Venkatesh,David Moffat,Eduardo Reck Miranda
出处
期刊:Applied sciences
[Multidisciplinary Digital Publishing Institute]
日期:2022-03-24
卷期号:12 (7): 3293-3293
被引量:31
摘要
Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.
科研通智能强力驱动
Strongly Powered by AbleSci AI