计算机科学
一致性(知识库)
语义学(计算机科学)
自然语言处理
情报检索
任务(项目管理)
人工智能
情态动词
常识
代表(政治)
知识表示与推理
程序设计语言
政治
经济
化学
管理
高分子化学
法学
政治学
作者
Wenhui Li,Song Yang,Qiang Li,Xuanya Li,An-An Liu
标识
DOI:10.1109/tmm.2023.3289753
摘要
Image-text retrieval, as a fundamental task in the cross-modal field, aims to explore the relationship between visual and textual modalities. Recent methods address this task only by learning the conceptual and syntactical correspondences between cross-modal fragments, but these correspondences inevitably contain noise without considering external knowledge. To solve this issue, we propose a novel C ommonsense-Guided S emantic and R elational C onsistencies (CSRC) for image-text retrieval that can simultaneously expand the semantics and relations to reduce the cross-modal differences under the assumption that the semantics and relations of the true image-text pair should be consistent between two modalities. Specifically, we first explore commonsense knowledge to expand the specific concepts for visual and textual graphs and optimize the semantic consistency by minimizing the differences in cross-modal semantic importance. Then, we extend the same relations for cross-modal concept pairs with semantic consistency, which serves to implement relational consistency. After that, we combine external commonsense knowledge with internal correlation to enhance concept representation and further optimize relational consistency by regularizing the importance differences between association-enhanced concepts. Extensive experimental results on two popular image-text retrieval datasets demonstrate the effectiveness of our proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI