亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake Detection

计算机科学 视听 人工智能 模态(人机交互) 机器学习 模式识别(心理学) 多媒体
作者
Yang Yu,Xiaolong Liu,Rongrong Ni,Siyuan Yang,Yao Zhao,Alex C. Kot
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
卷期号:34 (8): 6926-6936 被引量:12
标识
DOI:10.1109/tcsvt.2023.3309899
摘要

Deepfake techniques can forge the visual or audio signals in the video, which leads to inconsistencies between visual and audio (VA) signals. Therefore, multimodal detection methods expose deepfake videos by extracting VA inconsistencies. Recently, deepfake technology has started VA collaborative forgery to obtain more realistic deepfake videos, which poses new challenges for extracting VA inconsistencies. Recent multimodal detection methods propose to first extract natural VA correspondences in real videos in a self-supervised manner, and then use the learned real correspondences as targets to guide the extraction of VA inconsistencies in the subsequent deepfake detection stage. However, the inherent VA relations are difficult to extract due to the modality gap, which leads to the limited auxiliary performance of the aforementioned self-supervised methods. In this paper, we propose Predictive Visual-audio Alignment Self-supervision for Multimodal Deepfake Detection (PVASS-MDD), which consists of PVASS auxiliary and MDD stages. In the PVASS auxiliary stage in real videos, we first devise a three-stream network to associate two augmented visual views with corresponding audio clues, leading to explore common VA correspondences based on cross-view learning. Secondly, we introduce a novel cross-modal predictive align module for eliminating VA gaps to provide inherent VA correspondences. In the MDD stage, we propose to the auxiliary loss to utilize the frozen PVASS network to align VA features of real videos, to better assist multimodal deepfake detector for capturing subtle VA inconsistencies. We conduct extensive experiments on existing widely used and latest multimodal deepfake datasets. Our method obtains a significant performance improvement compared to state-of-the-art methods.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
天天快乐应助张土豆采纳,获得10
6秒前
英姑应助任性锦程采纳,获得10
7秒前
13秒前
张土豆完成签到,获得积分10
13秒前
01完成签到 ,获得积分10
15秒前
15秒前
任性锦程完成签到,获得积分10
19秒前
19秒前
19秒前
cortex发布了新的文献求助20
19秒前
张土豆发布了新的文献求助10
20秒前
愉快竺完成签到 ,获得积分10
22秒前
任性锦程发布了新的文献求助10
24秒前
Omni发布了新的文献求助10
24秒前
烟花应助科研通管家采纳,获得10
30秒前
Hello应助任性锦程采纳,获得10
31秒前
34秒前
39秒前
45秒前
987654发布了新的文献求助10
50秒前
51秒前
987654完成签到,获得积分10
55秒前
任性锦程发布了新的文献求助10
58秒前
格里菲斯完成签到,获得积分10
58秒前
11完成签到,获得积分20
58秒前
冷静硬币完成签到,获得积分10
1分钟前
11发布了新的文献求助10
1分钟前
1分钟前
1分钟前
cortex完成签到 ,获得积分10
1分钟前
orixero应助zjy采纳,获得10
1分钟前
小马甲应助时生111采纳,获得10
1分钟前
123完成签到,获得积分20
1分钟前
1分钟前
任性锦程发布了新的文献求助10
1分钟前
pop完成签到,获得积分10
1分钟前
时生111发布了新的文献求助10
1分钟前
orixero应助LeoSam采纳,获得10
1分钟前
1分钟前
zjy发布了新的文献求助10
1分钟前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2700
Social media impact on athlete mental health: #RealityCheck 1020
1.3μm GaAs基InAs量子点材料生长及器件应用 1000
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3526522
求助须知:如何正确求助?哪些是违规求助? 3106959
关于积分的说明 9281959
捐赠科研通 2804471
什么是DOI,文献DOI怎么找? 1539468
邀请新用户注册赠送积分活动 716571
科研通“疑难数据库(出版商)”最低求助积分说明 709579