计算机科学
人工智能
编码器
特征提取
注释
计算机视觉
特征(语言学)
医学影像学
语义学(计算机科学)
模式识别(心理学)
自然语言处理
机器学习
哲学
语言学
程序设计语言
操作系统
作者
Xiaojun Chen,Ke Jia,Yaning Zhang,Jianping Gou,Anna Shen,Shaohua Wan
出处
期刊:IEEE Journal of Biomedical and Health Informatics
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-14
被引量:1
标识
DOI:10.1109/jbhi.2024.3438254
摘要
With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image similar to video, which is captured and generated by a real-time moving probe, so how to deal with the video data in the medical field and cross modal extraction of the textual semantics in the medical video is a difficult problem that needs to be researched. For this reason, this paper proposes a pre-training model of multimodal distillation and fusion coding for processing the semantic relationship between ultrasound dynamic Images and text. Firstly, by designing the fusion encoder, the visual geometric features of tissues and organs in ultrasound dynamic images, the overall visual appearance descriptive features and the named entity linguistic features are fused to form a unified visual-linguistic feature, so that the model obtains richer visual, linguistic cues aggregation and alignment ability. Then, the pre-training model is augmented by multimodal knowledge distillation to improve the learning ability of the model. The final experimental results on multiple datasets show that the multimodal distillation pre-training model generally improves the fusion ability of various types of features in ultrasound dynamic images, and realizes the automated and accurate annotation of ultrasound dynamic images.
科研通智能强力驱动
Strongly Powered by AbleSci AI