Subjective and Objective Audio-Visual Quality Assessment for User Generated Content

计算机科学水准点（测量）构造（python库）数据库视频质量质量（理念） PEVQ公司情报检索数据挖掘人工智能机器学习主观视频质量图像质量图像（数学）大地测量学哲学经济公制（单位）认识论程序设计语言地理运营管理

作者

Yuqin Cao,Xiongkuo Min,Wei Sun,Guangtao Zhai

出处

期刊：IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：32: 3847-3861 被引量：14

链接

nih.govdoi.org

标识

DOI：10.1109/tip.2023.3290528

摘要

In recent years, User Generated Content (UGC) has grown dramatically in video sharing applications. It is necessary for service-providers to use video quality assessment (VQA) to monitor and control users' Quality of Experience when watching UGC videos. However, most existing UGC VQA studies only focus on the visual distortions of videos, ignoring that the perceptual quality also depends on the accompanying audio signals. In this paper, we conduct a comprehensive study on UGC audio-visual quality assessment (AVQA) from both subjective and objective perspectives. Specially, we construct the first UGC AVQA database named SJTU-UAV database, which includes 520 in-the-wild UGC audio and video (A/V) sequences collected from the YFCC100m database. A subjective AVQA experiment is conducted on the database to obtain the mean opinion scores (MOSs) of the A/V sequences. To demonstrate the content diversity of the SJTU-UAV database, we give a detailed analysis of the SJTU-UAV database as well as other two synthetically-distorted AVQA databases and one authentically-distorted VQA database, from both the audio and video aspects. Then, to facilitate the development of AVQA fields, we construct a benchmark of AVQA models on the proposed SJTU-UAV database and other two AVQA databases, of which the benchmark models consist of AVQA models designed for synthetically distorted A/V sequences and AVQA models built through combining the popular VQA methods and audio features via support vector regressor (SVR). Finally, considering benchmark AVQA models perform poorly in assessing in-the-wild UGC videos, we further propose an effective AVQA model via jointly learning quality-aware audio and visual feature representations in the temporal domain, which is seldom investigated by existing AVQA models. Our proposed model outperforms the aforementioned benchmark AVQA models on the SJTU-UAV database and two synthetically distorted AVQA databases. The SJTU-UAV database and the code of the proposed model will be released to facilitate further research.

求助该文献

最长约 10秒，即可获得该文献文件

Subjective and Objective Audio-Visual Quality Assessment for User Generated Content

今日热心研友