模式
计算机科学
模态(人机交互)
人工智能
变压器
分割
代表(政治)
机器学习
模式识别(心理学)
物理
法学
电压
社会学
政治学
政治
量子力学
社会科学
作者
Chengjian Qiu,Yuqing Song,Yi Liu,Yan Zhu,Kai Han,Victor S. Sheng,Zhe Liu
标识
DOI:10.1016/j.bspc.2023.105827
摘要
Accurate segmentation of brain tumors from multimodal MRI sequences is a critical prerequisite for brain tumor diagnosis, prognosis, and surgical treatment. While one or more modalities are often missing in clinical practice, which can collapse most previous methods that rely on all modality data. To deal with this problem, the current state-of-the-art Transformer-related approach directly fuses available modality-specific features to learn a shared latent representation, with the aim of extracting common features that are robust to any combinatorial subset of all modalities. However, it is not trivial to directly learn a shared latent representation due to the diversity of combinatorial subsets of all modalities. Furthermore, correlations across modalities as well as global multiscale features are not exploited in this Transformer-related approach. In this work, we propose a Multiscale Multimodal Vision Transformer (MMMViT), which not only leverages correlations across modalities to decouple the direct fusing procedure into two simple steps but also innovatively fuses local multiscale features as the input of the intra-modal Transformer block to implicitly obtain the global multiscale features to adapt to brain tumors of various sizes. We experiment on the BraTs 2018 dataset for all modalities and various missing-modalities as input, and the results demonstrate that the proposed method achieves the state-of-the-art performance. Code is available at: https://github.com/qiuchengjian/MMMViT.
科研通智能强力驱动
Strongly Powered by AbleSci AI