计算机科学
机器学习
人工智能
Boosting(机器学习)
组学
鉴定(生物学)
DNA甲基化
阶段(地层学)
癌症
算法
数据挖掘
生物信息学
医学
生物
基因
基因表达
内科学
古生物学
生物化学
植物
作者
Baoshan Ma,Fanyu Meng,Yan Ge,Haowen Yan,Bingjie Chai,Fengju Song
标识
DOI:10.1016/j.compbiomed.2020.103761
摘要
Accurate diagnostic classification of cancers can greatly help physicians to choose surveillance and treatment strategies for patients. Following the explosive growth of huge amounts of biological data, the shift from traditional biostatistical methods to computer-aided means has made machine-learning methods as an integral part of today's cancer prognosis prediction. In this work, we proposed a classification model by leveraging the power of extreme gradient boosting (XGBoost) and using increasingly complex multi-omics data with the aim to separate early stage and late stage cancers. We applied XGBoost model to four kinds of cancer data downloaded from TCGA and compared its performance with other popular machine-learning methods. The experimental results showed that our method obtained statistically significantly better or comparable predictive performance. The results of this study also revealed that DNA methylation outperforms other molecular data (mRNA expression and miRNA expression) in terms of accuracy and stability for discriminating between early stage and late stage groups. Furthermore, integration of multi-omics data by autoencoder can enhance the classification accuracy of cancer stage. Finally, we conducted bioinformatics analyses to assess the medical utility of the significant genes ranked by their importance using XGBoost algorithm. Extensively comparative experiments demonstrated that the XGBoost method has a remarkable performance in predicting the stage of cancer patients with multi-omics data. Moreover, identification of novel candidate genes associated with cancer stages would contribute to further elucidate disease pathogenesis and develop novel therapeutics.
科研通智能强力驱动
Strongly Powered by AbleSci AI