计算机科学
人工智能
变压器
RGB颜色模型
情态动词
计算机视觉
模式识别(心理学)
机器学习
量子力学
物理
电压
化学
高分子化学
作者
Junke Wang,Zuxuan Wu,Wenhao Ouyang,Xintong Han,Jingjing Chen,Yu–Gang Jiang,Ser-Nam Li
标识
DOI:10.1145/3512527.3531415
摘要
The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images. In this paper, we aim to capture the subtle manipulation artifacts at different scales using transformer models. In particular, we introduce a Multi-modal Multi-scale TRansformer (M2TR), which operates on patches of different sizes to detect local inconsistencies in images at different spatial levels. M2TR further learns to detect forgery artifacts in the frequency domain to complement RGB information through a carefully designed cross modality fusion block. In addition, to stimulate Deepfake detection research, we introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods. We conduct extensive experiments to verify the effectiveness of the proposed method, which outperforms state-of-the-art Deepfake detection methods by clear margins.
科研通智能强力驱动
Strongly Powered by AbleSci AI