随机森林
过采样
特征选择
接收机工作特性
选择(遗传算法)
人工智能
计算机科学
交叉验证
模式识别(心理学)
癌症
统计
数学
机器学习
生物
遗传学
带宽(计算)
计算机网络
作者
Jie Liu,Cheng Zhong,Jiamin Zhang,Kejun Liu,Mengjie Liu
标识
DOI:10.1080/02648725.2023.2202524
摘要
Gastric cancer (GC) is the third leading cause of cancer death worldwide. In the field of medicine, machine learning is widely used in genetic data mining and the construction of diagnostic models. This study proposed an intelligent model DERFS-XGBoost for rapid and accurate diagnosis of GC based on gene expression data. Firstly, the data of GC were collected and preprocessed. Secondly, ANOVA, t-test and fold chang (FC) were used to select genes that had significant differentially expressed genes (DEGs), and random forest (RF) was used to calculate their importance, and then sequential forward selection (SFS) was used to obtain the optimal feature subset. Finally, XGBoost was used for classification after synthetic minority oversampling technique (SMOTE) balanced between tumor and normal samples. In order to objectively evaluate the results, the 10-fold cross-validation and 10 repeated experiments were used in the experiment, and the average value of the evaluation indexes was used to evaluate the classification effect. Based on the experiment, DERFS-XGBoost model accuracy rate was 97.6%, precision was 100%, the recall rate was 97.3%,
科研通智能强力驱动
Strongly Powered by AbleSci AI