计算机科学
水准点(测量)
音质
语音识别
质量(理念)
视听
视频质量
主观视频质量
机器学习
预测建模
质量评定
人工智能
多媒体
图像质量
公制(单位)
工程类
可靠性工程
评价方法
哲学
大地测量学
地理
图像(数学)
认识论
运营管理
作者
Helard Becerra Martinez,Andrew Hines,Mylène C. Q. Farias
标识
DOI:10.1109/qomex55416.2022.9900891
摘要
Single-modal audio/speech and video quality models have reached high levels of performance. Although traditional algorithms are still preferred for many practical applications, advances in machine learning (ML) and deep learning techniques have exceeded their performance in several scientific comparisons. However, audio-visual (AV) models have received signifi-cantly less attention and development. Despite the acknowledged challenge that multimodal interaction poses to the AV problem, traditional AV models generally rely on simple fusion techniques of individual audio and video predictions. Consequently, the impact of recent advances in single-modal quality assessment models on SOTA (state-of-the-art) AV quality models merits attention. This paper presents a revised and updated benchmark for AV quality assessment with particular focus on new speech quality metrics. Three AV datasets were used to test audio, video, and AV quality metrics. For audio and video, the best performing metrics were selected to build simple late-fusion models using their raw predictions. The fused models were then compared to the SOTA AV models. Results show that a simple fusion strategy produces accurate AV quality predictions (LCC and SCC greater than 0.90) with low error rates (RMSE lower than 0.33). These results highlight the influence of advances in speech quality for AV quality assessment.
科研通智能强力驱动
Strongly Powered by AbleSci AI