鼻咽癌
情态动词
特征(语言学)
基础(证据)
计算机科学
人工智能
特征提取
医学
放射治疗
放射科
材料科学
语言学
历史
哲学
考古
高分子化学
作者
Zipei Wang,Mengjie Fang,Ling‐Long Tang,Jie Tian,Di Dong
标识
DOI:10.1109/tmi.2025.3558775
摘要
Providing precise and comprehensive diagnostic information to clinicians is crucial for improving the treatment and prognosis of nasopharyngeal carcinoma. Multi-modal foundation models, which can integrate data from various sources, have the potential to significantly enhance clinical assistance. However, several challenges remain: (1) the lack of large-scale visual-language datasets for nasopharyngeal carcinoma; (2) the inability of existing pre-training and fine-tuning methods to capture the hierarchical features required for complex clinical tasks; (3) current foundation models having limited visual perception due to inadequate integration of multi-modal information. While curriculum learning can improve a model's ability to handle multiple tasks through systematic knowledge accumulation, it still lacks consideration for hierarchical features and their dependencies, affecting knowledge gains. To address these issues, we propose the Hierarchical Feature Fusion Curriculum Learning method, which consists of three stages: visual knowledge learning, coarse-grained alignment, and fine-grained fusion. First, we introduce the Hybrid Contrastive Masked Autoencoder to pre-train visual encoders on 755K multi-modal images of nasopharyngeal carcinoma CT, MRI, and endoscopy to fully extract deep visual information. Then, we construct a 65K visual instruction fine-tuning dataset based on open-source data and clinician diagnostic reports, achieving coarse-grained alignment with visual information in a large language model. Finally, we design a Mixture of Experts Cross Attention structure for deep fine-grained fusion of global multi-modal information. Our model outperforms previously developed specialized models in all key clinical tasks for nasopharyngeal carcinoma, including diagnosis, report generation, tumor segmentation, and prognosis.
科研通智能强力驱动
Strongly Powered by AbleSci AI