Construction of a diagnostic classifier for cervical intraepithelial neoplasia and cervical cancer based on XGBoost feature selection and random forest model

随机森林 小桶 分类器(UML) 宫颈上皮内瘤变 宫颈癌 特征选择 降维 基因 计算生物学 人工智能 医学 基因表达 生物 遗传学 计算机科学 癌症 基因本体论
作者
Jing Zhang,Xiuqing Yang,Jia Chen,Jing Han,Xiaofeng Chen,Yun Fan,Hui Zheng
出处
期刊:Journal of Obstetrics and Gynaecology Research [Wiley]
卷期号:49 (1): 296-303 被引量:1
标识
DOI:10.1111/jog.15458
摘要

The pathological phenotype of early-stage cervical cancer (CC) is similar to that of cervical intraepithelial neoplasia (CIN), which provides a challenge for the diagnosis of cervical precancerous lesions. Meanwhile, the existing diagnostic methods have certain subjectivity and limitations, resulting in the possibility of misdiagnosis or missed diagnosis. Hence, some methods are needed to assist diagnosis of CC and CIN.Based on the data of CIN and CC in gene expression omnibus (GEO) dataset, the eXtreme Gradient Boosting (XGBoost) algorithm was used to screen the feature genes between CIN and CC for constructing the classifier. Incremental feature selection (IFS) curve was also used for screening. The classifier was validated for reliability using principal component analysis (PCA) dimensionality reduction analysis and heat map analysis of gene expression. Then, differentially expressed genes of CIN and CC were intersected with the classifier genes. Genes in the intersection were used as seeds for protein-protein interaction network construction and restart random walk analysis. And the genes with the top 50 affinity coefficients were selected for gene ontology (GO) and kyoto encyclopedia of genes and genome (KEGG) enrichment analyses to observe the biological functions with differences between CIN and CC.The peripheral blood genes of CIN and CC were analyzed, and seven genes were screened. Using this gene for classifier construction, IFS curve screening revealed that the three-feature gene classifier constructed according to the random forest model had the best effect. The results of PCA dimensionality reduction analysis and gene expression heat map analysis showed that the three-gene classifier could effectively distinguish CIN from CC.A three-gene diagnostic classifier can effectively distinguish CIN patients from CC patients and provide a reference for the clinical diagnosis of early CC.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
3秒前
三跳发布了新的文献求助10
3秒前
Wjk完成签到,获得积分10
5秒前
ZY完成签到 ,获得积分10
6秒前
6秒前
7秒前
赘婿应助hongw1980采纳,获得10
9秒前
繁荣的凝荷完成签到 ,获得积分10
9秒前
大个应助邱丘邱采纳,获得15
10秒前
谷谷发布了新的文献求助10
10秒前
12秒前
孙彩瑛发布了新的文献求助10
13秒前
yuxiaobolab完成签到,获得积分10
17秒前
传奇3应助33采纳,获得10
19秒前
20秒前
24秒前
25秒前
Lv完成签到,获得积分10
25秒前
purplelove发布了新的文献求助10
29秒前
孙彩瑛完成签到,获得积分10
30秒前
酷波er应助争当科研巨匠采纳,获得10
31秒前
32秒前
34秒前
34秒前
36秒前
活泼半凡发布了新的文献求助10
37秒前
小程完成签到 ,获得积分10
37秒前
Yy杨优秀发布了新的文献求助10
38秒前
39秒前
不安毛豆发布了新的文献求助10
39秒前
科研民工发布了新的文献求助10
40秒前
苏silence发布了新的文献求助10
40秒前
42秒前
bkagyin应助流光采纳,获得10
43秒前
oh应助zzznznnn采纳,获得10
44秒前
Cristina2024发布了新的文献求助30
44秒前
LCC完成签到 ,获得积分10
45秒前
comic完成签到,获得积分10
46秒前
46秒前
高分求助中
The Mother of All Tableaux: Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 3000
A new approach to the extrapolation of accelerated life test data 1000
Problems of point-blast theory 400
北师大毕业论文 基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 390
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
Robot-supported joining of reinforcement textiles with one-sided sewing heads 320
Novel Preparation of Chitin Nanocrystals by H2SO4 and H3PO4 Hydrolysis Followed by High-Pressure Water Jet Treatments 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3998752
求助须知:如何正确求助?哪些是违规求助? 3538216
关于积分的说明 11273702
捐赠科研通 3277200
什么是DOI,文献DOI怎么找? 1807436
邀请新用户注册赠送积分活动 883893
科研通“疑难数据库(出版商)”最低求助积分说明 810075