随机森林
可解释性
计算机科学
人工神经网络
计算生物学
机器学习
人工智能
集成学习
集合预报
系统生物学
深度学习
数据挖掘
蛋白质组学
生物
生物化学
基因
作者
Fan Xu,Shike Wang,Xinnan Dai,Piyushkumar A. Mundra,Jie Zheng
出处
期刊:Methods
[Elsevier]
日期:2020-10-09
卷期号:189: 65-73
被引量:20
标识
DOI:10.1016/j.ymeth.2020.10.001
摘要
Single-cell protein abundance is a fundamental type of information to characterize cell states. Due to high cost and technical barriers, however, direct quantification of proteins is difficult. Single-cell RNA sequencing (scRNA-seq) data, serving as a cost-effective substitute of single-cell proteomics, may not accurately reflect protein expression levels due to measurement error, noise, post-transcriptional and translational regulation, etc. The recently emerging single-cell multimodal omics data, e.g. CITE-seq and REAP-seq, can simultaneously profile RNA and protein abundances in single cells, providing labeled data for predictive modeling in a supervised learning framework. Deep neural network-based transfer learning method has been applied to imputation of surface protein abundances from single-cell transcriptomic data. However, it is unclear if the artificial neural network is the best model, and it is desirable to improve the prediction performance (e.g. accuracy, interpretability) of machine learning models. In this paper, we compared several tree-based ensemble learning methods with neural network models, and found that ensemble learning often performed better than neural network, and Random Forest (RF) performed the best overall. Moreover, we used the feature importance scores from RF to interpret biological mechanisms underlying the prediction. Our study demonstrates the effectiveness of ensemble learning for reliable protein abundances prediction using single-cell multimodal omics data, and paves the way for knowledge discovery by mining single-cell multi-omics data in large scale.
科研通智能强力驱动
Strongly Powered by AbleSci AI