计算机科学
情态动词
人工智能
编码器
变压器
保险丝(电气)
模式识别(心理学)
数据挖掘
工程类
电压
高分子化学
电气工程
化学
操作系统
作者
Liming Xu,Quan Tang,Bochuan Zheng,Jiancheng Lv,Weisheng Li,Xianhua Zeng
出处
期刊:IEEE Journal of Biomedical and Health Informatics
[Institute of Electrical and Electronics Engineers]
日期:2024-06-14
卷期号:28 (9): 5600-5612
标识
DOI:10.1109/jbhi.2024.3414413
摘要
Medical report generation, as a cross-modal automatic text generation task, can be highly significant both in research and clinical fields. The core is to generate diagnosis reports in clinical language from medical images. However, several limitations persist, including a lack of global information, inadequate cross-modal fusion capabilities, and high computational demands. To address these issues, we propose cross-modal global feature fusion Transformer (CGFTrans) to extract global information meanwhile reduce computational strain. Firstly, we introduce mesh recurrent network to capture inter-layer information at different levels to address the absence of global features. Then, we design feature fusion decoder and define 'mid-fusion' strategy to separately fuse visual and global features with medical report embeddings, which enhances the ability of the cross-modal joint learning. Finally, we integrate shifted window attention into Transformer encoder to alleviate computational pressure and capture pathological information at multiple scales. Extensive experiments conducted on three datasets demonstrate that the proposed method achieves average increments of 2.9%, 1.5%, and 0.7% in terms of the BLEU-1, METEOR and ROUGE-L metrics, respectively. Besides, it achieves average increments -22.4% and 17.3% training time and images throughput, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI