计算机科学
人工智能
特征(语言学)
编码器
特征学习
保险丝(电气)
模式识别(心理学)
深度学习
利用
代表(政治)
机器学习
计算机视觉
工程类
哲学
电气工程
操作系统
政治
语言学
法学
计算机安全
政治学
作者
Renjie Pan,Ruisheng Ran,Wei Hu,Wenfeng Zhang,Qibing Qin,Shujuan Cui
出处
期刊:IEEE Journal of Biomedical and Health Informatics
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:: 1-12
标识
DOI:10.1109/jbhi.2023.3345932
摘要
Intelligent medicine is eager to automatically generate radiology reports to ease the tedious work of radiologists. Previous researches mainly focused on the text generation with encoder-decoder structure, while CNN networks for visual features ignored the long-range dependencies correlated with textual information. Besides, few studies exploit cross-modal mappings to promote radiology report generation. To alleviate the above problems, we propose a novel end-to-end radiology report generation model dubbed Self-Supervised dual-Stream Network (S3-Net). Specifically, a Dual-Stream Visual Feature Extractor (DSVFE) composed of ResNet and SwinTransformer is proposed to capture more abundant and effective visual features, where the former focuses on local response and the latter explores long-range dependencies. Then, we introduced the Fusion Alignment Module (FAM) to fuse the dual-stream visual features and facilitate alignment between visual features and text features. Furthermore, the Self-Supervised Learning with Mask(SSLM) is introduced to further enhance the visual feature representation ability. Experimental results on two mainstream radiology reporting datasets (IU X-ray and MIMIC-CXR) show that our proposed approach outperforms previous models in terms of language generation metrics.
科研通智能强力驱动
Strongly Powered by AbleSci AI