遮罩(插图)
计算机科学
判别式
人工智能
分割
图像(数学)
模式识别(心理学)
代表(政治)
计算机视觉
艺术
政治
政治学
法学
视觉艺术
作者
Yutong Xie,Lin Gu,Tatsuya Harada,Jianpeng Zhang,Yong Xia,Qi Wu
标识
DOI:10.1007/978-3-031-43907-0_2
摘要
Masked image modelling (MIM)-based pre-training shows promise in improving image representations with limited annotated data by randomly masking image patches and reconstructing them. However, random masking may not be suitable for medical images due to their unique pathology characteristics. This paper proposes Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that masks and reconstructs discriminative areas guided by radiological reports, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge word-driven masking (KWM) and sentence-driven masking (SDM). KWM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify discriminative cues mapped to MeSH words and guide the mask generation. SDM considers that reports usually have multiple sentences, each of which describes different findings, and therefore integrates sentence-level information to identify discriminative regions for mask generation. MedIM integrates both strategies by simultaneously restoring the images masked by KWM and SDM for a more robust and representative medical visual representation. Our extensive experiments on various downstream tasks covering multi-label/class image classification, medical image segmentation, and medical image-text analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM .
科研通智能强力驱动
Strongly Powered by AbleSci AI