Modelling-based joint embedding of histology and genomics using canonical correlation analysis for breast cancer survival prediction

乳腺癌典型相关计算机科学嵌入概率逻辑人工智能基因组学相关性机器学习数据挖掘模式识别（心理学）癌症医学数学基因组生物内科学基因几何学生物化学

作者

Vidhya Subramanian,Tanveer Syeda-Mahmood,N. Minh

出处

期刊：Artificial Intelligence in Medicine [Elsevier BV]
日期：2024-03-01 卷期号：149: 102787-102787 被引量：1

链接

nih.govdoi.org

标识

DOI：10.1016/j.artmed.2024.102787

摘要

Traditional approaches to predicting breast cancer patients’ survival outcomes were based on clinical subgroups, the PAM50 genes, or the histological tissue’s evaluation. With the growth of multi-modality datasets capturing diverse information (such as genomics, histology, radiology and clinical data) about the same cancer, information can be integrated using advanced tools and have improved survival prediction. These methods implicitly exploit the key observation that different modalities originate from the same cancer source and jointly provide a complete picture of the cancer. In this work, we investigate the benefits of explicitly modelling multi-modality data as originating from the same cancer under a probabilistic framework. Specifically, we consider histology and genomics as two modalities originating from the same breast cancer under a probabilistic graphical model (PGM). We construct maximum likelihood estimates of the PGM parameters based on canonical correlation analysis (CCA) and then infer the underlying properties of the cancer patient, such as survival. Equivalently, we construct CCA-based joint embeddings of the two modalities and input them to a learnable predictor. Real-world properties of sparsity and graph-structures are captured in the penalized variants of CCA (pCCA) and are better suited for cancer applications. For generating richer multi-dimensional embeddings with pCCA, we introduce two novel embedding schemes that encourage orthogonality to generate more informative embeddings. The efficacy of our proposed prediction pipeline is first demonstrated via low prediction errors of the hidden variable and the generation of informative embeddings on simulated data. When applied to breast cancer histology and RNA-sequencing expression data from The Cancer Genome Atlas (TCGA), our model can provide survival predictions with average concordance-indices of up to 68.32% along with interpretability. We also illustrate how the pCCA embeddings can be used for survival analysis through Kaplan–Meier curves.

求助该文献

最长约 10秒，即可获得该文献文件

Modelling-based joint embedding of histology and genomics using canonical correlation analysis for breast cancer survival prediction

今日热心研友