作者
Zhao Xiao,Xinfeng Yan,Fĕi Li,Kang-Wen Xiao,G H Liu
摘要
Using an integrated bioinformatics approach to find novel biomarkers that can predict asthma severity. From June 2022 to December 2022, this clinical medical study was conducted and completed in the Department of Allergy, Zhongnan Hospital of Wuhan University. The gene chip dataset GSE43696 was screened and downloaded from the high-throughput Gene Expression Omnibus (GEO) database, and the gene chip data preprocessing was completed using package "affy" in R and "rma" algorithm in turn. Use the the "edgeR" and "limma" packages to screen out the differentially expressed genes (DEGs) between normal controls, mild to moderate asthma patients and severe asthma patients, and then use the "clusterProfiler" package to perform GO enrichment analysis and KEGG pathway enrichment analysis of DEGs, finally use the STRING website to construct a protein-protein interaction (PPI) network of DEGs to further screen key genes. Using the R language "WGCNA" package, the weighted gene co-expression network analysis (WGCNA) was performed on the dataset GSE43696, and the modules significantly related to the severity of asthma were screened out, then the hub genes were obtained by intersecting the WGCNA analysis results with the DEGs screened by PPI. Datasets GSE43696 and GSE63142 were used to verify the expression of hub genes, and the diagnostic value was evaluated according to the ROC curve, then the potential function of hub genes in dataset GSE43696 was further clarified by gene set enrichment analysis (GSEA). The results showed that a total of 251 DEGs were screened, including 39 in the normal group and mild to moderate asthma group, 178 in the normal group and severe asthma group, and 34 in the mild to moderate asthma group and severe asthma group, mainly involved in biological processes such as response to toxic substance, response to oxidative stress, extracellular structure organization, extracellular matrix organization. Two modules significantly correlated with asthma severity were screened out (red module, P=7e-6, r=0.43; pink module, P=5e-8, r=-0.51), and finally six hub genes were obtained, including B3GNT6, CEACAM5, CCK, ERBB2, CSH1 and DPPA5. The comparison of gene expression levels and ROC curve analysis of datasets GSE43696 and GSE63142 further verified the six hub genes, which may associated with o-glycan biosynthesis, alpha linolenic acid metabolism, linoleic acid metabolism, pentose and glucoronate interconversions. In conclusion, through a variety of bioinformatics analysis methods, this study identified six hub genes significantly related to the severity of asthma, which potentially provided a new direction for the prediction and targeted therapy of asthma.本研究利用综合生物信息学方法寻找能够预测哮喘严重程度的新生物标志物。于2022年6至12月,在武汉大学中南医院过敏反应科开展并完成该临床医学研究。从高通量基因表达(Gene Expression Omnibus,GEO)数据库中筛选并下载基因芯片数据集GSE43696,使用R语言“affy”包和“rma”算法完成基因芯片数据预处理。利用“edgeR”包和“limma”包筛选出正常对照者、轻中度哮喘患者和重度哮喘患者两两之间的差异表达基因(differentially expressed genes,DEGs),然后用“clusterProfiler”包对差异表达基因进行GO功能富集分析和KEGG通路富集分析,最后用STRING网站构建差异表达基因的蛋白-蛋白互作(protein-protein interaction,PPI)网络,进一步筛选差异表达基因。使用R语言“WGCNA”包对数据集GSE43696进行加权基因共表达网络分析(weighted gene co-expression network analysis,WGCNA),筛选出与哮喘严重程度显著相关的模块,将WGCNA分析结果与PPI筛选的差异表达基因取交集得到关键(hub)基因。利用数据集GSE43696和GSE63142对hub基因表达量进行验证,依据ROC曲线评估诊断价值,并通过基因集富集分析(gene set enrichment analysis,GSEA)进一步明确数据集GSE43696中hub基因的潜在功能。结果显示,共筛选出251个DEGs,其中正常组和轻中度哮喘组39个,正常组和重度哮喘组178个,轻中度哮喘组和重度哮喘组34个,主要参与对有毒物质的反应、对氧化应激的反应、细胞外结构组织、细胞外基质组织等生物过程。筛选出两个与哮喘严重程度显著相关的模块(red模块,P=7e-6,r=0.43;pink模块,P=5e-8,r=-0.51),最终得到6个hub基因,包括B3GNT6、CEACAM5、CCK、ERBB2、CSH1和DPPA5。通过数据集GSE43696和GSE63142的基因表达水平比较和ROC曲线分析进一步验证了这6个hub基因,可能与o-聚糖生物合成、α-亚麻酸代谢、亚油酸代谢和戊糖葡萄糖酸的相互转化等过程相关。综上,通过多种生物信息学分析方法,本研究鉴定出与哮喘严重程度显著相关的6个hub基因,为哮喘的病情预测和靶向治疗提供了可能的新方向。.