计算机科学
推论
人工智能
因果推理
自然语言处理
计量经济学
数学
作者
Jiali Chen,Zhen‐Ren Guo,Jiayuan Xie,Yi Cai,Qing Li
标识
DOI:10.1145/3581783.3612536
摘要
Visual Question Generation (VQG) task aims to generate meaningful and logically reasonable questions about the given image targeting an answer. Existing methods mainly focus on the visual concepts present in the image for question generation and have shown remarkable performance in VQG. However, these models frequently learn highly co-occurring object relationships and attributes, which is an inherent bias in question generation. This previously overlooked bias causes models to over-exploit the spurious correlations among visual features, the target answer, and the question. Therefore, they may generate inappropriate questions that contradict the visual content or facts. In this paper, we first introduce a causal perspective on VQG and adopt the causal graph to analyze spurious correlations among variables. Building on the analysis, we propose a Knowledge Enhanced Causal Visual Question Generation (KECVQG) model to mitigate the impact of spurious correlations in question generation. Specifically, an interventional visual feature extractor (IVE) is introduced in KECVQG, which aims to obtain unbiased visual features by disentangling. Then a knowledge-guided representation extractor (KRE) is employed to align unbiased features with external knowledge. Finally, the output features from KRE are sent into a standard transformer decoder to generate questions. Extensive experiments on the VQA v2.0 and OKVQA datasets show that KECVQG significantly outperforms existing models.
科研通智能强力驱动
Strongly Powered by AbleSci AI