M2FNet: Multi-granularity Feature Fusion Network for Medical Visual Question Answering

计算机科学答疑人工智能关系（数据库）特征（语言学）粒度特征提取机器学习语义学（计算机科学）任务（项目管理）卷积神经网络情报检索深度学习自然语言处理数据挖掘哲学操作系统经济管理程序设计语言语言学

作者

He Wang,Haiwei Pan,Kejia Zhang,Shuning He,Chunling Chen

出处

期刊：Lecture Notes in Computer Science 日期：2022-01-01 卷期号：: 141-154 被引量：4

标识

DOI：10.1007/978-3-031-20865-2_11

摘要

Medical Vision Question Answer (VQA) is a combination of medical artificial intelligence and visual question answering, which is a complex multimodal task. The purpose is to obtain accurate answers based on images and questions to assist patients in understanding their personal situations as well as to provide doctors with decision-making options. Although CV and NLP have driven great progress in medical VQA, challenges still exist in medical VQA due to the characteristics of the medical domain. First, the use of a meta-learning model for image feature extraction can accelerate the convergence of medical VQA models, but it will contain different degrees of noise, which will degrade the effectiveness of feature fusion in medical VQA, thereby affecting the accuracy of the model. Second, the currently existing medical VQA methods only mine the relation between medical images and questions from a single granularity or focus on the relation within the question, which leads to an inability to comprehensively understand the relation between medical images and questions. Thus, we propose a novel multi-granularity medical VQA model. On the one hand, we apply multiple meta-learning models and a convolutional denoising autoencoder for image feature extraction, and then optimize it using an attention mechanism. On the other hand, we propose to represent the question features at three granularities of words, phrases, and sentences, while a keyword filtering module is proposed to obtain keywords from word granularity, and then the stacked attention module with different granularities is used to fuse the question features with the image features to mine the relation from multiple granularities. Experimental results on the VQA-RAD dataset demonstrate that the proposed method outperforms the currently existing meta-learning medical VQA methods, with an overall accuracy improvement of 1.8% compared to MMQ, and it has more advantages for long questions.

求助该文献

最长约 10秒，即可获得该文献文件

M2FNet: Multi-granularity Feature Fusion Network for Medical Visual Question Answering

今日热心研友