计算机科学
人工智能
分类器(UML)
编码器
模式识别(心理学)
特征学习
机器学习
背景(考古学)
生物
操作系统
古生物学
作者
Shiming Ge,Fanzhao Lin,Chenyu Li,Daichi Zhang,Jiyong Tan,Weiping Wang,Dan Zeng
标识
DOI:10.1145/3469877.3490586
摘要
Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach's effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.
科研通智能强力驱动
Strongly Powered by AbleSci AI