情态动词
人工智能
模态(人机交互)
计算机科学
嵌入
模式
不变(物理)
相似性(几何)
特征(语言学)
公制(单位)
模式识别(心理学)
样品(材料)
机器学习
数学
图像(数学)
社会学
哲学
经济
化学
高分子化学
色谱法
语言学
数学物理
社会科学
运营管理
作者
Jiexi Yan,Cheng Deng,Heng Huang,Wei Liu
标识
DOI:10.1109/tpami.2024.3379752
摘要
In the real world, how to effectively learn consistent similarity measurement across different modalities is essential. Most of the existing similarity learning methods cannot deal well with cross-modal data due to the modality gap and have obvious performance degeneration when applied to cross-modal data. To tackle this problem, we propose a novel cross-modal similarity learning method, called Causality-Invariant Interactive Mining (CIIM), that can effectively capture informative relationships among different samples and modalities to derive the modality-consistent feature embeddings in the unified metric space. Our CIIM tackles the modality gap from two aspects, i.e. , sample-wise and feature-wise. Specifically, we start from the sample-wise view and learn the single-modality and hybrid-modality proxies for exploring the cross-modal similarity with the elaborate metric losses. In this way, sample-to-sample and sample-to-proxy correlations are both taken into consideration. Furthermore, we conduct the causal intervention to eliminate the modality bias and reconstruct the invariant causal embedding in the feature-wise aspect. To this end, we force the learned embeddings to satisfy the specific properties of our causal mechanism and derive the causality-invariant feature embeddings in the unified metric space. Extensive experiments on two cross-modality tasks demonstrate the superiority of our proposed method over the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI