计算机科学
过度拟合
立体声录音
事件(粒子物理)
人工智能
光学(聚焦)
背景(考古学)
特征提取
水准点(测量)
特征(语言学)
过程(计算)
卷积神经网络
模式识别(心理学)
循环神经网络
像素
语音识别
人工神经网络
古生物学
哲学
物理
光学
操作系统
生物
地理
量子力学
语言学
大地测量学
作者
Jiaxiang Meng,Xingmei Wang,Wang Jin-li,Xuyang Teng,Yuelin Xu
标识
DOI:10.1016/j.dsp.2022.103434
摘要
Sound event detection has recently become a hot topic in the sound process while discontinuous and overlapping sound events still pose challenges for sound event detection. In this paper, we propose a capsule network with pixel-based attention and bidirectional gated recurrent unit (PBA-AttCapsNet-BGRU) model which contains the high-level feature extraction module, the attention capsule network (AttCapsNet) module, and the bidirectional gated recurrent unit (BGRU) module. Specifically, pixel-based attention (PBA) is employed in the convolutional neural network named PBACNN to extract features more relevant to sound events from binaural log-Mel spectrograms (bin-LMS) features in the high-level feature extraction module. The module can solve the problem of discontinuous sound events. Furthermore, to detect overlapping sound events effectively, we propose an AttCapsNet module that combines capsule network (CapsNet) and soft attention mechanism. Also, the attention dynamic routing algorithm is introduced to validly distinguish the existence of sound events and focus on the significant frames in this paper. In addition, BGRU module is composed of BGRU and time-distributed fully-connected layers. It can obtain the context information and overcome the overfitting problem to a certain extent. We conducted the experiments on Task 4 of the DCASE 2017 Challenge. Experimental results show that the proposed PBA-AttCapsNet-BGRU model can achieve 0.032 improvements in F1 and 0.07 improvement in ER with the state-of-the-art models in sound event detection.
科研通智能强力驱动
Strongly Powered by AbleSci AI