Rethinking masked image modeling for medical image representation

遮罩（插图）计算机科学人工智能分割医学影像学代表（政治）模式识别（心理学）图像（数学）计算机视觉政治政治学法学艺术视觉艺术

作者

Yutong Xie,Lin Gu,Tatsuya Harada,Jianpeng Zhang,Yong Xia,Qi Wu

出处

期刊：Medical Image Analysis [Elsevier]
日期：2024-08-17 卷期号：98: 103304-103304 被引量：1

链接

nih.govdoi.org

标识

DOI：10.1016/j.media.2024.103304

摘要

Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words (e.g., cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.

求助该文献

最长约 10秒，即可获得该文献文件

Rethinking masked image modeling for medical image representation

今日热心研友