计算机视觉
人工智能
特征(语言学)
弹丸
计算机科学
医学影像学
图像(数学)
模态(人机交互)
芯(光纤)
特征提取
图像处理
模式识别(心理学)
材料科学
电信
哲学
语言学
冶金
作者
Xiaowei Yu,Lu Zhang,Zihao Wu,Dajiang Zhu
出处
期刊:IEEE Transactions on Medical Imaging
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-1
标识
DOI:10.1109/tmi.2024.3482228
摘要
Multi-modality learning, exemplified by the language-image pair pre-trained CLIP model, has demonstrated remarkable performance in enhancing zero-shot capabilities and has gained significant attention recently. However, simply applying language-image pre-trained CLIP to medical image analysis encounters substantial domain shifts, resulting in severe performance degradation due to inherent disparities between natural (non-medical) and medical image characteristics. To address this challenge and uphold or even enhance CLIP's zero-shot capability in medical image analysis, we develop a novel approach, Core-Periphery feature alignment for CLIP (CP-CLIP), to model medical images and corresponding clinical text jointly. To achieve this, we design an auxiliary neural network whose structure is organized by the core-periphery (CP) principle. This auxiliary CP network not only aligns medical image and text features into a unified latent space more efficiently but also ensures alignment driven by principles of brain network organization. In this way, our approach effectively mitigates and further enhances CLIP's zero-shot performance in medical image analysis. More importantly, the proposed CP-CLIP exhibits excellent explanatory capability, enabling the automatic identification of critical disease-related regions in clinical analysis. Extensive experiments and evaluation across five public datasets covering different diseases underscore the superiority of our CP-CLIP in zero-shot medical image prediction and critical features detection, showing its promising utility in multimodal feature alignment in current medical applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI