计算机科学
编码器
人工智能
算法
实时计算
操作系统
作者
Shujun Zhang,Qi Han,Jinsong Li,Yukang Sun,Yuhua Qin
标识
DOI:10.1016/j.bspc.2024.106251
摘要
The automatic medical report generation task can reduce the burden of radiologists and improve the intelligence of auxiliary diagnosis, but still faces the following challenges: (1) The small lesions are easily overlooked, leading to loss of crucial information in the report and low accuracy; (2) The generated long text reports often suffer from jumbled word order and sentence order, resulting in poor fluency. Through simulation of the cognitive principle of professional physicians during their training and work, this paper put forward a medical report generation method integrating a teacher–student model with an encoder–decoder network. The core idea is to propose a cross-modal teacher (text)-student (image) model, adopting different supervision methods for different stages of report generation to improve the model's learning performance. A semantic space alignment mechanism is designed to enhance the cross-modal feature matching ability by contrasting the encoding methods of different modalities through adversarial learning, gradually optimizing and capturing the critical information. A layer-supervised decoder based on the Transformer hierarchical structure is proposed with the teacher model guiding the student model to decode layer by layer to increase the fluency of report generation. Comparative experiments are conducted on IU-X-ray and MIMIC-CXR datasets with various other methods, and the results show that the proposed method can effectively improve the quality of generated reports.
科研通智能强力驱动
Strongly Powered by AbleSci AI