计算机科学
语义学(计算机科学)
人工智能
情态动词
利用
桥接(联网)
匹配(统计)
代表(政治)
路径(计算)
模式识别(心理学)
关系(数据库)
图像(数学)
数据挖掘
数学
计算机网络
化学
统计
计算机安全
政治
政治学
高分子化学
法学
程序设计语言
作者
Yan Wang,Yuting Su,Wenhui Li,Jun Xiao,Xuanya Li,An-An Liu
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2023-10-01
卷期号:33 (10): 6144-6158
被引量:15
标识
DOI:10.1109/tcsvt.2023.3254530
摘要
Image and text matching plays a crucial role in bridging the cross-modal gap between vision and language, and has achieved great progress due to the deep learning. However, the existing methods still suffer from the long-tail problem, where only a small proportion contains highly frequent semantics and a long tail proportion is constructed by rare semantics. In this paper, we propose a novel Dual-path Rare Content Enhancement Network (DRCE) to tackle the long-tail issue. Specifically, the Cross-modal Representation Enhancement (CRE) and Cross-modal Association Enhancement (CAE) are proposed to construct dual-path structure to enhance rare content representation and association with the benefit of cross-modal prior knowledge. This structure can effectively exploit the complementary cross-modal relation from different aspects and fuse these information in an adaptively manner by the proposed Adaptive Fusion Strategy (AFS). Moreover, we also propose an alternative re-ranking strategy (ARR) to explore the reciprocal contextual information to refine image-text matching results, which can further suppress the negative effect of long-tail effect. Extensive experiments on two large-scale datasets show the significant improvements and validate the superiority of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI