情态动词
互补性(分子生物学)
计算机科学
相似性(几何)
水准点(测量)
钥匙(锁)
语义相似性
人工智能
数据挖掘
模式识别(心理学)
图像(数学)
大地测量学
生物
化学
遗传学
高分子化学
计算机安全
地理
作者
Xinlei Pei,Zheng Liu,Shanshan Gao,Yijun Su
标识
DOI:10.1016/j.eswa.2022.119415
摘要
Cross-modal retrieval takes a query of one modality to retrieve relevant results from another modality, and its key issue lies in how to learn the cross-modal similarity. Note that the complete semantic information of a specific concept is widely scattered over the multi-modal and multi-grained data, and it cannot be thoroughly captured by most existing methods to learn the cross-modal similarity accurately. Therefore, we propose a Multi-modal and Multi-grained Hierarchical Semantic Enhancement network (M2HSE), which contains two stages to obtain more complete semantic information by fusing the complementarity in multi-modal and multi-grained data. In stage 1, two classes of cross-modal similarity (primary similarity and auxiliary similarity) are calculated more comprehensively in two subnetworks. Especially, the primary similarities from two subnetworks are fused to perform the cross-modal retrieval, while the auxiliary similarity provides a valuable complement for the primary similarity. In stage 2, the multi-spring balance loss is proposed to optimize the cross-modal similarity more flexibly. Utilizing this loss, the most representative samples are selected to establish the multi-spring balance system, which adaptively optimizes the cross-modal similarities until reaching the equilibrium state. Extensive experiments conducted on public benchmark datasets clearly prove the effectiveness of our proposed method and show its competitive performance with the state-of-the-arts.
科研通智能强力驱动
Strongly Powered by AbleSci AI