Dynamic Difference Learning With Spatio–Temporal Correlation for Deepfake Video Detection

计算机科学帧（网络）边距（机器学习）骨料（复合）人工智能面子（社会学概念）航程（航空）时差学习计算机视觉帧间模式识别（心理学）帧速率机器学习参考坐标系强化学习电信社会科学社会学复合材料材料科学

作者

Qilin Yin,Wei Lu,Bin Li,Jiwu Huang

出处

期刊：IEEE Transactions on Information Forensics and Security [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：18: 4046-4058 被引量：29

标识

DOI：10.1109/tifs.2023.3290752

摘要

With the rapid development of face forgery techniques, the existing frame-based deepfake video detection methods have fell into a dilemma that frame-based methods may fail when encountering extremely realistic images. To overcome the above problem, many approaches attempted to model the spatio-temporal inconsistency of videos to distinguish real and fake videos. However, current works model spatio-temporal inconsistency by combining intra-frame and inter-frame information, but ignore the disturbance caused by facial motions that would limit further improvement in detection performance. To address this issue, we investigate into long and short range inter-frame motions and propose a novel dynamic difference learning method to distinguish between the inter-frame differences caused by face manipulation and the inter-frame differences caused by facial motions in order to model precise spatio-temporal inconsistency for deepfake video detection. Moreover, we elaborately design a dynamic fine-grained difference capture module (DFDC-module) and a multi-scale spatio-temporal aggregation module (MSA-module) to collaboratively model spatio-temporal inconsistency. Specifically, the DFDC-module applies self-attention mechanism and fine-grained denoising operation to eliminate the differences caused by facial motions and generates long range difference attention maps. The MSA-module is devised to aggregate multi-direction and multi-scale temporal information to model spatio-temporal inconsistency. The existing 2D CNNs can be extended into dynamic spatio-temporal inconsistency capture networks by integrating the proposed two modules. Extensive experimental results demonstrate that our proposed algorithm steadily outperforms state-of-the-art methods by a clear margin in different benchmark datasets.

求助该文献

最长约 10秒，即可获得该文献文件

Dynamic Difference Learning With Spatio–Temporal Correlation for Deepfake Video Detection

今日热心研友