计算机科学
语音识别
语音活动检测
Mel倒谱
变压器
人工智能
编码器
分类器(UML)
模式识别(心理学)
语音处理
特征提取
工程类
电压
电气工程
操作系统
作者
Wenpeng Mu,Bingshan Liu
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:11: 31238-31243
被引量:10
标识
DOI:10.1109/access.2023.3262518
摘要
Voice Activity Detection (VAD) is a widely used technique for separating vocal regions from audio signals, with applications in voice language coding, noise reduction, and other domains. While various strategies have been proposed to improve VAD performance, such as ACAM, DCU-10, and Tr-VAD, these approaches often suffer from common limitations, including being unsuitable for long audio and being time-consuming. To address these issues, we propose a new method called AAT-VAD, which integrates an adaptive width attention learning mechanism into the classic transformer framework. Our approach involves extracting Mel-scale Frequency Cepstral Coefficients (MFCC) from the Mel scale frequency domain, adding a masking function to each transformer attention head, and inputting the features processed by the transformer encoder layer into the classifier. Experimental results indicate that our method achieves a 12.8% higher F1-score than DCU-10 and a 0.6% higher F1-score than Tr-VAD under different noise interferences. Furthermore, the average detection cost function (DCF) value of our method is only 14.3% of DCU-10 and 92.4% of Tr-VAD, and the test time of AAT-VAD is only 37.4% of that of Tr-VAD for the same noisy vocal mixed audio.
科研通智能强力驱动
Strongly Powered by AbleSci AI