组学
特征选择
随机森林
降维
计算机科学
维数之咒
数据挖掘
基因本体论
特征(语言学)
选择(遗传算法)
人工智能
机器学习
生物信息学
基因
生物
基因表达
生物化学
语言学
哲学
作者
Yanyu Hu,Long Zhao,Zhao Li,Xiangjun Dong,Tiantian Xu,Yuhai Zhao
标识
DOI:10.1016/j.eswa.2022.116813
摘要
Gastric cancer has the highest incidence among all types of malignant tumors. The rapid development of high-throughput gene technology has greatly promoted people’s understanding of gastric cancer at the molecular level. However, there is a lack of information in single omics data, so dimensionality reduction is an effective method to overcome the dimensionality disaster of omics data. omics data has the characteristics of being multivariate and high-dimensional, which affects the efficiency of classification. Therefore, dimensionality reduction is an effective method to overcome the dimensionality disaster of omics data. However, neural network learning algorithm is seldom used to improve classification accuracy when feature selection of multi-omics data is carried out, therefore, in this study, a random forest deep feature selection (RDFS) algorithm was proposed. By integrating gene expression (Exp) data and copy number variation (CNV) data, the dimensions of multi-omics data were reduced and improve the classification accuracy by using a random forest and deep neural network. The results showed that the accuracy and area under the curve (AUC) of multi-omics data were better than that of single-omics data under the RDFS algorithm. With other feature selection algorithms, RDFS also had a higher prediction accuracy and AUC. We also validated the effect of feature selection on RDFS. Finally, survival analysis was used to evaluate the important genes identified during feature selection and to obtain enrichment gene ontology (GO) terms and biological pathways for these genes.
科研通智能强力驱动
Strongly Powered by AbleSci AI