计算机科学
人工智能
编码器
情态动词
基线(sea)
透视图(图形)
任务(项目管理)
图像(数学)
机制(生物学)
机器学习
哲学
海洋学
化学
管理
认识论
高分子化学
经济
地质学
操作系统
作者
Shaokang Yang,Jianwei Niu,Jiyan Wu,Xuefeng Liu
标识
DOI:10.1007/978-3-030-60248-2_48
摘要
Medical image report writing is a time-consuming and knowledge intensive task. However, the existing machine/deep learning models often incur similar reports and inaccurate descriptions. To address these critical issues, we propose a multi-view and multi-modal (MvMM) approach which utilizes various-perspective visual features and medical semantic features to generate diverse and accurate medical reports. First, we design a multi-view encoder with attention to extract visual features from the frontal and lateral viewing angles. Second, we extract medical concepts from the radiology reports which are adopted as semantic features and combined with visual features through a two-layer decoder with attention. Third, we fine-tune the model parameters using self-critical training with a coverage reward to generate more accurate medical concepts. Experimental results show that our method achieves noticeable performance improvements over the baseline approaches and increases CIDEr scores by 0.157.
科研通智能强力驱动
Strongly Powered by AbleSci AI