计算机科学
编码器
人工智能
机器学习
变压器
深度学习
模式识别(心理学)
特征学习
答疑
可视化
突出
量子力学
操作系统
物理
电压
作者
Richard J. Chen,Ming Y. Lu,Wei‐Hung Weng,Tiffany Chen,Drew F. K. Williamson,Trevor Manz,Maha Shady,Faisal Mahmood
标识
DOI:10.1109/iccv48922.2021.00398
摘要
Survival outcome prediction is a challenging weakly-supervised and ordinal regression task in computational pathology that involves modeling complex interactions within the tumor microenvironment in gigapixel whole slide images (WSIs). Despite recent progress in formulating WSIs as bags for multiple instance learning (MIL), representation learning of entire WSIs remains an open and challenging problem, especially in overcoming: 1) the computational complexity of feature aggregation in large bags, and 2) the data heterogeneity gap in incorporating biological priors such as genomic measurements. In this work, we present a Multimodal Co-Attention Transformer (MCAT) framework that learns an interpretable, dense co-attention mapping between WSIs and genomic features formulated in an embedding space. Inspired by approaches in Visual Question Answering (VQA) that can attribute how word embed-dings attend to salient objects in an image when answering a question, MCAT learns how histology patches attend to genes when predicting patient survival. In addition to visualizing multimodal interactions, our co-attention trans-formation also reduces the space complexity of WSI bags, which enables the adaptation of Transformer layers as a general encoder backbone in MIL. We apply our proposed method on five different cancer datasets (4,730 WSIs, 67 million patches). Our experimental results demonstrate that the proposed method consistently achieves superior performance compared to the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI