计算机科学
情态动词
模态(人机交互)
嵌入
图形
概括性
人工智能
对偶(语法数字)
理论计算机科学
模式识别(心理学)
情报检索
高分子化学
心理学
艺术
化学
文学类
心理治疗师
作者
Dapeng Chen,Min Wang,Haobin Chen,Lin Wu,Jing Qin,Wei Peng
标识
DOI:10.1145/3503161.3548195
摘要
Conventional methods address the cross-modal retrieval problem by projecting the multi-modal data into a shared representation space. Such a strategy will inevitably lose the modality-specific information, leading to decreased retrieval accuracy. In this paper, we propose heterogeneous graph embeddings to preserve more abundant cross-modal information. The embedding from one modality will be compensated with the aggregated embeddings from the other modality. In particular, a self-denoising tree search is designed to reduce the "label noise" problem, making the heterogeneous neighborhood more semantically relevant. The dual-path aggregation tackles the "modality imbalance" problem, giving each sample comprehensive dual-modality information. The final heterogeneous graph embedding is obtained by feeding the aggregated dual-modality features to the cross-modal self-attention module. Experiments conducted on cross-modality person re-identification and image-text retrieval task validate the superiority and generality of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI