典型相关
特征选择
人工智能
组学
相关性
排名(信息检索)
计算机科学
机器学习
数据集成
特征(语言学)
模式识别(心理学)
数据挖掘
计算生物学
生物信息学
数学
生物
哲学
几何学
语言学
作者
Sehwan Moon,Jeongyoung Hwang,Hyunju Lee
标识
DOI:10.1089/cmb.2021.0598
摘要
Integration of multi-omics data provides opportunities for revealing biological mechanisms related to certain phenotypes. We propose a novel method of multi-omics integration called supervised deep generalized canonical correlation analysis (SDGCCA) for modeling correlation structures between nonlinear multi-omics manifolds that aims at improving the classification of phenotypes and revealing the biomarkers related to phenotypes. SDGCCA addresses the limitations of other canonical correlation analysis (CCA)-based models (such as deep CCA, deep generalized CCA) by considering complex/nonlinear cross-data correlations between multiple (≥2) modalities. Although there are a few methods to learn nonlinear CCA projections for classifying phenotypes, they only consider two views. Methods extended to multiple views either do not perform classification or do not provide feature ranking. In contrast, SDGCCA is a nonlinear multi-view CCA projection method that performs classification and ranks features. When we applied SDGCCA in predicting patients with Alzheimer's disease (AD) and discrimination of early- and late-stage cancers, it outperformed other CCA-based and other supervised methods. In addition, we demonstrate that SDGCCA can be applied for feature selection to identify important multi-omics biomarkers. On applying AD data, SDGCCA identified clusters of genes in multi-omics data, well known to be associated with AD.
科研通智能强力驱动
Strongly Powered by AbleSci AI