表型
计算生物学
特征选择
降维
计算机科学
可扩展性
范畴变量
生物
机器学习
人工智能
基因
遗传学
数据库
作者
Tao Ren,Canping Chen,Alexey V. Danilov,Susan Liu,Xiangnan Guan,Shunyi Du,Xiwei Wu,Mara H. Sherman,Paul T. Spellman,Lisa M. Coussens,Andrew Adey,Gordon B. Mills,Ling‐Yun Wu,Zheng Xia
标识
DOI:10.1038/s42256-023-00656-y
摘要
Accurately identifying phenotype-relevant cell subsets from heterogeneous cell populations is crucial for delineating the underlying mechanisms driving biological or clinical phenotypes. Here by deploying a Learning with Rejection strategy, we developed a novel supervised learning framework called PENCIL to identify subpopulations associated with categorical or continuous phenotypes from single-cell data. By embedding a feature selection function into this flexible framework, for the first time, we were able to simultaneously select informative features and identify cell subpopulations, enabling accurate identification of phenotypic subpopulations otherwise missed by methods incapable of concurrent gene selection. Furthermore, the regression mode of PENCIL presents a novel ability for supervised phenotypic trajectory learning of subpopulations from single-cell data. We conducted comprehensive simulations to evaluate PENCIL's versatility in simultaneous gene selection, subpopulation identification and phenotypic trajectory prediction. PENCIL is fast and scalable to analyse one million cells within 1 h. Using the classification mode, PENCIL detected T-cell subpopulations associated with melanoma immunotherapy outcomes. Moreover, when applied to single-cell RNA sequencing of a patient with mantle cell lymphoma with drug treatment across multiple timepoints, the regression mode of PENCIL revealed a transcriptional treatment response trajectory. Collectively, our work introduces a scalable and flexible infrastructure to accurately identify phenotype-associated subpopulations from single-cell data. To detect phenotype-related cell subpopulations from single-cell data, appropriate feature sets need to be chosen or learned simultaneously. Ren et al. present here a tool based on Learning with Rejection, a method that during training learns features from cells that can be predicted with high confidence, while cells that the model is not yet certain about are rejected.
科研通智能强力驱动
Strongly Powered by AbleSci AI