计算机科学
词(群论)
光学(聚焦)
相关性(法律)
自然语言处理
人工智能
任务(项目管理)
滤波器(信号处理)
情报检索
匹配(统计)
图像(数学)
语义匹配
图像检索
计算机视觉
语言学
数学
物理
法学
管理
经济
哲学
光学
统计
政治学
作者
Song Yang,Qiang Li,Wenhui Li,Xuanya Li,Ran Jin,Bo Lv,Rui Wang,An-An Liu
摘要
Image–text retrieval is a vital task in computer vision and has received growing attention, since it connects cross-modality data. It comes with the critical challenges of learning unified representations and eliminating the large gap between visual and textual domains. Over the past few decades, although many works have made significant progress in image–text retrieval, they are still confronted with the challenge of incomplete text descriptions of images, i.e., how to fully learn the correlations between relevant region–word pairs with semantic diversity. In this article, we propose a novel semantic completion and filtration (SCAF) method to alleviate the above issue. Specifically, the text semantic completion module is presented to generate a complete semantic description of an image using multi-view text descriptions, guiding the model to explore the correlations of relevant region–word pairs fully. Meanwhile, the adaptive structural semantic matching module is presented to filter irrelevant region–word pairs by considering the relevance score of each region–word pair, which facilitates the model to focus on learning the relevance of matching pairs. Extensive experiments show that our SCAF outperforms the existing methods on Flickr30K and MSCOCO datasets, which demonstrates the superiority of our proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI