答疑
计算机科学
情态动词
自然语言处理
桥接(联网)
人工智能
情报检索
合并(版本控制)
任务(项目管理)
语义学(计算机科学)
开放域
高分子化学
管理
程序设计语言
化学
经济
计算机网络
作者
順時 湯浅,Bingquan Liu,Chengjie Sun,Zhen Xu,Lin Ma,Baoxun Wang
标识
DOI:10.1109/tmm.2023.3326616
摘要
This paper proposes a new multi-modal question-answering task, named as Cross-Modal Information Complementation based Question Answering (CroMIC-QA), to promote the exploration on bridging the semantic gap between visual and linguistic signals. The proposed task is inspired by the common phenomenon that, in most user-generated QA scenarios, the information of the given textual question is incomplete, and thus it is required to merge the semantics of both the text and the accompanying image to infer the complete real question. In this work, the CroMIC-QA task is first formally defined and compared with the classic Visual Question Answering (VQA) task. On this basis, a specified dataset, CroMIC-QA-Agri, is collected from an online QA community in the agriculture domain for the proposed task. A group of experiments is conducted on this dataset, with the typical multi-modal deep architectures implemented and compared. The experimental results show that the appropriate text/image presentations and text-image semantic interaction methods are effective to improve the performance of the framework.
科研通智能强力驱动
Strongly Powered by AbleSci AI