计算机科学
模式
水准点(测量)
模态(人机交互)
情态动词
编码器
人工智能
嵌入
成对比较
特征(语言学)
对抗制
机器学习
模式识别(心理学)
社会学
地理
高分子化学
化学
大地测量学
哲学
操作系统
语言学
社会科学
作者
Ying Chen,Dong Zhou,Lin Li,Junmei Han
标识
DOI:10.1007/978-3-030-85899-5_19
摘要
The task of retrieving across different modalities plays a critical role in food-oriented applications. Modality alignment remains a challenging component in the whole process, in which a common embedding feature space between two modalities can be learned for effective comparison and retrieval. Recent studies mainly utilize adversarial loss or reconstruction loss to align different modalities. However, insufficient features may be extracted from different modalities, resulting in low quality of alignments. Unlike these methods, this paper proposes a method combining multimodal encoders with adversarial learning to learn improved and efficient cross-modal embeddings for retrieval purposes. The core of our proposed approach is the directional pairwise cross-modal attention that latently adapts representations from one modality to another. Although the model is not particularly complex, experimental results on the benchmark Recipe1M dataset show that our proposed method is superior to current state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI