随机森林
过采样
特征选择
接收机工作特性
选择(遗传算法)
人工智能
计算机科学
交叉验证
模式识别(心理学)
癌症
统计
数学
机器学习
生物
遗传学
带宽(计算)
计算机网络
作者
Jie Liu,Cheng Zhong,Jiamin Zhang,Kejun Liu,Mengjie Liu
标识
DOI:10.1080/02648725.2023.2202524
摘要
Gastric cancer (GC) is the third leading cause of cancer death worldwide. In the field of medicine, machine learning is widely used in genetic data mining and the construction of diagnostic models. This study proposed an intelligent model DERFS-XGBoost for rapid and accurate diagnosis of GC based on gene expression data. Firstly, the data of GC were collected and preprocessed. Secondly, ANOVA, t-test and fold chang (FC) were used to select genes that had significant differentially expressed genes (DEGs), and random forest (RF) was used to calculate their importance, and then sequential forward selection (SFS) was used to obtain the optimal feature subset. Finally, XGBoost was used for classification after synthetic minority oversampling technique (SMOTE) balanced between tumor and normal samples. In order to objectively evaluate the results, the 10-fold cross-validation and 10 repeated experiments were used in the experiment, and the average value of the evaluation indexes was used to evaluate the classification effect. Based on the experiment, DERFS-XGBoost model accuracy rate was 97.6%, precision was 100%, the recall rate was 97.3%, F1 was 99%, and the area under the ROC receiver operating characteristic curve AUC was 98.7%. The DERFS-XGBoost model has new characteristics which are different from existing diagnostic models, and has achieved a high classification effect with a small number of genes in comparison tests, which provides a new method and basis for the diagnosis of GC.
科研通智能强力驱动
Strongly Powered by AbleSci AI