计算机科学
代表(政治)
嵌入
人工智能
透视图(图形)
一般化
模式
模态(人机交互)
期限(时间)
矩阵分解
模式识别(心理学)
判别式
面子(社会学概念)
机器学习
数据挖掘
数学
数学分析
社会科学
特征向量
物理
量子力学
社会学
政治
政治学
法学
作者
Jianghao Wu,Baopeng Zhang,Zhaoyang Li,Guilin Pang,Teng Zhu,Jianping Fan
标识
DOI:10.1109/tcsvt.2023.3269841
摘要
As face forgery techniques have become more mature, the proliferation of deepfakes may threaten the security of human society. Although existing deepfake detection methods achieve good performance for in-dataset evaluation, it remains to be improved in the generalization ability, where the representation of the imperceptible artifacts plays a significant role. In this paper, we propose an Interactive Two-Stream Network (ITSNet) to explore the discriminant inconsistency representation from the perspective of cross-modality. In particular, the patch-wise Decomposable Discrete Cosine Transform (DDCT) is adopted to extract fine-grained high-frequency clues, and information from different modalities communicates with each other via a designed interaction module. To perceive the temporal inconsistency, we first develop a Short-term Embedding Module (SEM) to refine subtle local inconsistency representation between adjacent frames, and then a Long-term Embedding Module (LEM) is designed to further refine the erratic temporal inconsistency representation from the long-range perspective. Extensive experimental results conducted on three public datasets show that ITSNet outperforms the state-of-the-art methods both in terms of in-dataset and cross-dataset evaluations.
科研通智能强力驱动
Strongly Powered by AbleSci AI