发布文献求助

How Well Apply Multimodal Mixup and Simple MLPs Backbone to Medical Visual Question Answering?

计算机科学增采样一般化人工智能特征（语言学）联营机器学习简单（哲学）感知器骨干网答疑编码（集合论）特征提取数据挖掘模式识别（心理学）人工神经网络图像（数学）程序设计语言哲学数学分析计算机网络集合（抽象数据类型）认识论语言学数学

作者

Lei Liu,Xiangdong Su

标识

DOI：10.1109/bibm55620.2022.9995347

摘要

Although current methods have significantly improved the performance of medical visual question answering (Med-VQA), there are still two aspects worth exploring, namely the simplification of model structure and the effective model training on small-scale data. Different from the previous Med-VQA model, this paper only employs multi-layer perceptrons (MLPs) as the backbone network for feature extraction and modal fusion and designs a Med-VQA model on such basis, which achieves superior performance with a simple backbone network. To enhance model generalization, we design multimodal mixup (M-Mixup) to augment images and questions separately, which effectively alleviates the problem of insufficient training samples in the Med-VQA task. To prevent the destruction of the feature relationship when tokenizing the medical image, we design pooling tokens (PTs), a simple downsampling structure to capture fine-grained visual features without affecting the parameters and FLOPs of the entire model. Experimental results demonstrate that our model achieves state-of-the-art on the SLAKE, and obtains a remarkably competitive performance on the VQA-RAD. The source code and models are available at https://github.com/Alivelei/M-Mixup.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2024年影响因子查询已上线 (2024-6-20)

更新

大幅提高文件上传限制，最高150M (2024-4-1)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: Kyone完成签到，获得积分10

2秒前; 慕青上传了应助文件

6秒前; Lr完成签到，获得积分10

8秒前; 充电宝上传了应助文件

8秒前; 半柚的应助被wll采纳，获得10

8秒前; 勤劳的小洛克完成签到，获得积分10

9秒前; 研友_LMBa6n发布了新的文献求助10

10秒前; zhangsf88完成签到，获得积分10

11秒前; 李健的粉丝团团长上传了应助文件

11秒前; 辣条发布了新的文献求助10

12秒前; 毛123完成签到，获得积分10

13秒前; 轻松小之发布了新的文献求助10

13秒前; lzy发布了新的文献求助10

14秒前; 英姑上传了应助文件

18秒前; FashionBoy的应助被支初晴采纳，获得10

18秒前; 无名老大上传了应助文件

19秒前; lzy完成签到，获得积分10

20秒前; now发布了新的文献求助10

24秒前; xqy完成签到，获得积分10

30秒前; 深情海秋完成签到，获得积分10

32秒前; 勤劳寒烟完成签到，获得积分10

35秒前; 无名老大上传了应助文件

37秒前; yile完成签到，获得积分10

42秒前; FashionBoy上传了应助文件

44秒前; 科研通AI2S的应助被勤恳的凌雪采纳，获得10

45秒前; 天真依玉完成签到，获得积分10

46秒前; 支初晴发布了新的文献求助10

48秒前; 大个的应助被有星星的小路采纳，获得10

48秒前; 无名老大上传了应助文件

50秒前; 在水一方上传了应助文件

51秒前; 煜钧驳回了科研通AI2S的应助

52秒前; 长江发布了新的文献求助10

54秒前; 隐形曼青的应助被现实的觅波采纳，获得10

55秒前; Rain发布了新的文献求助10

57秒前; 聪明大米发布了新的文献求助10

59秒前; 小虎同学完成签到，获得积分10

1分钟前; 思源的应助被长江采纳，获得10

1分钟前; orixero的应助被淡然采纳，获得10

1分钟前; 朴素元珊发布了新的文献求助30

1分钟前; 科研通管家关闭了leier的文献求助

1分钟前

高分求助中: Востребованный временем 2500; The Restraining Hand: Captivity for Christ in China 500; The Collected Works of Jeremy Bentham: Rights, Representation, and Reform: Nonsense upon Stilts and Other Writings on the French Revolution 320; Encyclopedia of Mental Health Reference Work 300; 脑血管病 300; The Unity of the Common Law 300; Teaching Essential Units of Language 200

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3372166; 求助须知：如何正确求助？哪些是违规求助？ 2990056; 关于积分的说明 8738516; 捐赠科研通 2673400; 什么是DOI，文献DOI怎么找？ 1464426; 科研通“疑难数据库（出版商）”最低求助积分说明 677527; 邀请新用户注册赠送积分活动 668912

今日热心研友

吡咯爱成环

糟糕的铁锤

互助遵法尚德

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通