计算机科学
杠杆(统计)
代表(政治)
人工智能
图像(数学)
特征学习
自然语言处理
机器学习
模式识别(心理学)
政治学
政治
法学
作者
Chen Cheng,Aoxiao Zhong,Dufan Wu,Jie Luo,Quanzheng Li
标识
DOI:10.1007/978-3-031-43904-9_48
摘要
Self-supervised learning (SSL) of visual representations from paired medical images and text reports has recently shown great promise for various downstream tasks. However, previous work has focused on investigating the effectiveness of two major SSL techniques separately, i.e., contrastive learning and masked autoencoding, without exploring their potential synergies. In this paper, we aim to integrate the strengths of these two techniques by proposing a contrastive masked image-text modeling framework for medical visual representation learning. On one hand, our framework conducts cross-modal contrastive learning between masked medical images and text reports, with a representation decoder being incorporated to recover the misaligned information in the masked images. On the other hand, to further leverage masked autoencoding, a masked image is also required to be able to reconstruct the original image itself and the masked information in the text reports. With pre-training on a large-scale medical image and report dataset, our framework shows complementary benefits of integrating the two SSL techniques on four downstream classification datasets. Extensive evaluations demonstrate consistent improvements of our method over state-of-the-art approaches, especially when very scarce labeled data are available. code is available at https://github.com/cchen-cc/CMITM .
科研通智能强力驱动
Strongly Powered by AbleSci AI