计算机科学
情态动词
人工智能
不变(物理)
语义学(计算机科学)
发电机(电路理论)
班级(哲学)
模式
自然语言处理
模式识别(心理学)
数学
功率(物理)
程序设计语言
高分子化学
化学
社会学
物理
量子力学
数学物理
社会科学
作者
Kai Wang,Yifan Wang,Xing Xu,Zhiwei Cao,Xunliang Cai
标识
DOI:10.1109/icme52920.2022.9860026
摘要
Zero-shot Cross-Modal Retrieval (ZS-CMR) is challenging due to the heterogeneous distributions across different modalities and the inconsistent semantics across seen and unseen classes. Previous methods usually perform class-level semantic alignment of data from different modalities by introducing auxiliary word embeddings of class labels, which have a fatal limitation as the learning of class-level information will lead to the ignorance of intra-modal variance. To solve this problem, we propose our Instance-Level Semantic Alignment (ILSA) method to make full use of the instance-level information. We use two disentanglement variational auto-encoders to decompose the data from two modalities into modal specific and modal invariant features. With an instance-level semantic features extractor and a distribution generator, ILSA could generate more appropriate distributions by the learned instance-level semantic features, without any auxiliary knowledge. We perform the experiment on six widely used datasets on two scenarios of ZS-CMR, the results show that our method establishes the new state-of-the-art performance on all datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI