计算机科学
情态动词
情报检索
图像检索
人工智能
图像(数学)
计算机视觉
自然语言处理
材料科学
高分子化学
作者
Bing Xia,Ruinan Yang,Yunxiang Ge,Jiabin Yin
摘要
With the rapid advancement of Internet technology and the widespread adoption of smart devices, there has been a substantial increase in multimodal data that conveys identical semantics but in diverse coding formats. To foster the advancement of social intelligence, scholars are increasingly investigating the semantic correlations among multimodal data, which represents a current research focal point. The primary objective of cross-modal accurately compute the dissimilar modalities and efficiently retrieve relevant data from other modalities. The objective of this article is to provide comprehensive overview of the advancements in cross-modal retrieval research. First, it presents a conceptual framework and problem formulation for cross-modal retrieval elucidating, the multimodal nature of image and text cross-modal retrieval. Secondly, it delves into semantic representation learning-based approaches for computing image text cross-modal similarity and hash-based methods for facilitating cross-modal retrieval. Furthermore, a comparative analysis is conducted on widely adopted evaluation metrics for current cross-modal retrieval techniques, accompanied by outlook on future research directions.
科研通智能强力驱动
Strongly Powered by AbleSci AI