Lasso(编程语言)
基因
比例危险模型
分类器(UML)
计算生物学
基因表达谱
计算机科学
肺癌
基因表达
回归
机器学习
人工智能
生物信息学
生物
医学
肿瘤科
内科学
遗传学
数学
万维网
统计
作者
Hemant Kumar Joon,Anamika Thalor,Dinesh Gupta
标识
DOI:10.1016/j.compbiomed.2023.107430
摘要
Lung squamous cell carcinoma (LUSC) patients are often diagnosed at an advanced stage and have poor prognoses. Thus, identifying novel biomarkers for the LUSC is of utmost importance.Multiple datasets from the NCBI-GEO repository were obtained and merged to construct the complete dataset. We also constructed a subset from this complete dataset with only known cancer driver genes. Further, machine learning classifiers were employed to obtain the best features from both datasets. Simultaneously, we perform differential gene expression analysis. Furthermore, survival and enrichment analyses were performed.The kNN classifier performed comparatively better on the complete and driver datasets' top 40 and 50 gene features, respectively. Out of these 90 gene features, 35 were found to be differentially regulated. Lasso-penalized Cox regression further reduced the number of genes to eight. The median risk score of these eight genes significantly stratified the patients, and low-risk patients have significantly better overall survival. We validated the robust performance of these eight genes on the TCGA dataset. Pathway enrichment analysis identified that these genes are associated with cell cycle, cell proliferation, and migration.This study demonstrates that an integrated approach involving machine learning and system biology may effectively identify novel biomarkers for LUSC.
科研通智能强力驱动
Strongly Powered by AbleSci AI