随机森林
小桶
分类器(UML)
宫颈上皮内瘤变
宫颈癌
特征选择
降维
基因
计算生物学
人工智能
医学
基因表达
生物
遗传学
计算机科学
癌症
基因本体论
作者
Jing Zhang,Xiuqing Yang,Jia Chen,Jing Han,Xiaofeng Chen,Yun Fan,Hui Zheng
摘要
The pathological phenotype of early-stage cervical cancer (CC) is similar to that of cervical intraepithelial neoplasia (CIN), which provides a challenge for the diagnosis of cervical precancerous lesions. Meanwhile, the existing diagnostic methods have certain subjectivity and limitations, resulting in the possibility of misdiagnosis or missed diagnosis. Hence, some methods are needed to assist diagnosis of CC and CIN.Based on the data of CIN and CC in gene expression omnibus (GEO) dataset, the eXtreme Gradient Boosting (XGBoost) algorithm was used to screen the feature genes between CIN and CC for constructing the classifier. Incremental feature selection (IFS) curve was also used for screening. The classifier was validated for reliability using principal component analysis (PCA) dimensionality reduction analysis and heat map analysis of gene expression. Then, differentially expressed genes of CIN and CC were intersected with the classifier genes. Genes in the intersection were used as seeds for protein-protein interaction network construction and restart random walk analysis. And the genes with the top 50 affinity coefficients were selected for gene ontology (GO) and kyoto encyclopedia of genes and genome (KEGG) enrichment analyses to observe the biological functions with differences between CIN and CC.The peripheral blood genes of CIN and CC were analyzed, and seven genes were screened. Using this gene for classifier construction, IFS curve screening revealed that the three-feature gene classifier constructed according to the random forest model had the best effect. The results of PCA dimensionality reduction analysis and gene expression heat map analysis showed that the three-gene classifier could effectively distinguish CIN from CC.A three-gene diagnostic classifier can effectively distinguish CIN patients from CC patients and provide a reference for the clinical diagnosis of early CC.
科研通智能强力驱动
Strongly Powered by AbleSci AI