数字水印
计算机科学
视听
数字水印联盟
人工智能
计算机视觉
多媒体
图像(数学)
作者
Bofei Guo,Haoxuan Tai,Guibo Luo,Yuesheng Zhu
标识
DOI:10.1109/iceiec61773.2024.10561738
摘要
The rise of Deepfake technology presents a significant challenge to the integrity of information. Most existing Deepfake detection methods rely on visual artifacts to distinguish between the authentic and manipulated content, but they are unable to cope with unseen tampering method and easily affected by post-processing. Although recent investigations have tried to proactively protect facial images using deep watermarking techniques, more deceptive Deepfakes often incorporate both visual and audio modalities. To address this issue, we propose a novel proactive Deepfake detection framework for both audio and visual modalities by utilizing a unified encoder-decoder architecture to embed audio-visual watermarks. Also, an audiovisual feature encoder is developed to align the audio and visual information. The multi-modal watermarking is designed to embed a watermark as the detection clue in each modality respectively and conduct verification of both modalities together to detect Deepfaked multimedia. By adding a distortion layer between embedding and extracting during training, the embedded watermark is able to be robust against common post-processing operations (e.g., JPEG compression) while remaining sensitive to Deepfake manipulations (e.g., SimSwap) in the water-mark verification. Our experimental results on VidTIMIT have demonstrated that the proposed watermarking framework can effectively detect various advanced Deepfake manipulations and achieve good robustness to different kinds of common distortions compared with passive uni-modal and multi-modal Deepfake detection methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI