常识推理
常识
计算机科学
视觉推理
认知
人工智能
可视化
桥(图论)
任务(项目管理)
人机交互
基于知识的系统
心理学
内科学
经济
神经科学
管理
医学
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2021-03-01
卷期号:31 (3): 1042-1054
被引量:4
标识
DOI:10.1109/tcsvt.2020.2991866
摘要
When glancing at an image, human can infer what is hidden in the image beyond what is visually obvious, such as objects’ functions, people’s intents and mental states. However, such a visual reasoning paradigm is tremendously difficult for computer, requiring knowledge about how the world works. To address this issue, we propose Commonsense Knowledge based Reasoning Model (CKRM) to acquire external knowledge to support Visual Commonsense Reasoning (VCR) task, where the computer is expected to answer challenging visual questions. Our key ideas are: (1) To bridge the gap between recognition-level and cognition-level image understanding, we inject external commonsense knowledge via multi-level knowledge transfer network , achieving cell-level, layer-level and attention-level joint information transfer. It can effectively capture knowledge from different perspectives, and perceive common sense of human in advance. (2) To further promote image understanding at cognitive level, we propose a knowledge based reasoning approach , which can relate the transferred knowledge to visual content and compose the reasoning cues to derive the final answer. Experiments are conducted on the challenging visual commonsense reasoning dataset VCR to verify the effectiveness of our proposed CKRM approach, which can significantly improve reasoning performance and achieve the state-of-the-art accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI