组学
特征选择
计算机科学
数据集成
机器学习
鉴定(生物学)
癌症
特征(语言学)
降维
人工智能
数据挖掘
计算生物学
生物信息学
生物
语言学
哲学
植物
遗传学
作者
Jesse C.S. Pang,Bilin Liang,Ruifeng Ding,Qi Yan,Ruiyao Chen,Jianpeng Xu
摘要
The availability of high-throughput sequencing data creates opportunities to comprehensively understand human diseases as well as challenges to train machine learning models using such high dimensions of data. Here, we propose a denoised multi-omics integration framework, which contains a distribution-based feature denoising algorithm, Feature Selection with Distribution (FSD), for dimension reduction and a multi-omics integration framework, Attention Multi-Omics Integration (AttentionMOI) to predict cancer prognosis and identify cancer subtypes. We demonstrated that FSD improved model performance either using single omic data or multi-omics data in 15 The Cancer Genome Atlas Program (TCGA) cancers for survival prediction and kidney cancer subtype identification. And our integration framework AttentionMOI outperformed machine learning models and current multi-omics integration algorithms with high dimensions of features. Furthermore, FSD identified features that were associated to cancer prognosis and could be considered as biomarkers.
科研通智能强力驱动
Strongly Powered by AbleSci AI