计算机科学
答疑
一致性(知识库)
自然语言
自然语言生成
发电机(电路理论)
任务(项目管理)
人工智能
编码(集合论)
生成语法
桥(图论)
语义鸿沟
源代码
过程(计算)
自然语言处理
机器学习
情报检索
图像(数学)
图像检索
程序设计语言
集合(抽象数据类型)
医学
功率(物理)
物理
管理
量子力学
内科学
经济
作者
Jiayuan Xie,Yi Cai,Jiali Chen,Ruohang Xu,Jiexin Wang,Qing Li
出处
期刊:IEEE transactions on image processing
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:33: 2652-2664
被引量:1
标识
DOI:10.1109/tip.2024.3379900
摘要
Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency between the answers and explanations due to their disregard of the crucial interactions between these factors. Moreover, existing methods overlook the potential benefits of incorporating additional knowledge, which hinders their ability to effectively bridge the semantic gap between questions and images, leading to less accurate explanations. In this paper, we present a novel approach denoted the knowledge-based iterative consensus VQA-NLE (KICNLE) model to address these limitations. To maintain consistency, our model incorporates an iterative consensus generator that adopts a multi-iteration generative method, enabling multiple iterations of the answer and explanation in each generation. In each iteration, the current answer is utilized to generate an explanation, which in turn guides the generation of a new answer. Additionally, a knowledge retrieval module is introduced to provide potentially valid candidate knowledge, guide the generation process, effectively bridge the gap between questions and images, and enable the production of high-quality answer-explanation pairs. Extensive experiments conducted on three different datasets demonstrate the superiority of our proposed KICNLE model over competing state-of-the-art approaches. Our code is available at https://github.com/Gary-code/KICNLE.
科研通智能强力驱动
Strongly Powered by AbleSci AI