计算机科学
情态动词
自然语言处理
人工智能
人机交互
语音识别
化学
高分子化学
作者
Chenlu Zhan,Yufei Zhang,Yu Lin,Gaoang Wang,Hongwei Wang
标识
DOI:10.1109/tmm.2024.3397191
摘要
Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-growing medical diagnostics application. However, most Med-VLP models learn task-specific representations independently from scratch, thereby leading to great inflexibility when they work across multiple fine-tuning tasks. In this work, we propose UniDCP , a Uni fied medical vision-language model with D ynamic C ross-modal learnable P rompts, which can be plastically applied to multiple medical vision-language tasks within a unified model. Specifically, we explicitly construct a unified framework to harmonize diverse inputs from multiple pre-training tasks by leveraging cross-modal prompts for unification, which accordingly can accommodate heterogeneous medical fine-tuning tasks within a same model. Furthermore, we conceive a dynamic cross-modal prompt optimizing strategy that optimizes the prompts within the shareable space for implicitly processing the shareable clinic knowledge. UniDCP is the first Med-VLP model capable of performing all 8 medical uni-modal and cross-modal tasks over 14 corresponding datasets, consistently yielding superior results over diverse state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI