Universal Cell Embeddings: A Foundation Model for Cell Biology

基础(证据) 生物 细胞 计算生物学 政治学 遗传学 法学
作者
Yanay Rosen,Yusuf Roohani,Ayush Agrawal,Leon Samotorcan,Tabula Sapiens Consortium,Stephen R. Quake,Jure Leskovec
标识
DOI:10.1101/2023.11.28.568918
摘要

Developing a universal representation of cells which encompasses the tremendous molecular diversity of cell types within the human body and more generally, across species, would be transformative for cell biology. Recent work using single-cell transcriptomic approaches to create molecular definitions of cell types in the form of cell atlases has provided the necessary data for such an endeavor. Here, we present the Universal Cell Embedding (UCE) foundation model. UCE was trained on a corpus of cell atlas data from human and other species in a completely self-supervised way without any data annotations. UCE offers a unified biological latent space that can represent any cell, regardless of tissue or species. This universal cell embedding captures important biological variation despite the presence of experimental noise across diverse datasets. An important aspect of UCE's universality is that any new cell from any organism can be mapped to this embedding space with no additional data labeling, model training or fine-tuning. We applied UCE to create the Integrated Mega-scale Atlas, embedding 36 million cells, with more than 1,000 uniquely named cell types, from hundreds of experiments, dozens of tissues and eight species. We uncovered new insights about the organization of cell types and tissues within this universal cell embedding space, and leveraged it to infer function of newly discovered cell types. UCE's embedding space exhibits emergent behavior, uncovering new biology that it was never explicitly trained for, such as identifying developmental lineages and embedding data from novel species not included in the training set. Overall, by enabling a universal representation for every cell state and type, UCE provides a valuable tool for analysis, annotation and hypothesis generation as the scale and diversity of single cell datasets continues to grow.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
李爱国应助科研通管家采纳,获得10
刚刚
LL应助科研通管家采纳,获得10
刚刚
情怀应助科研通管家采纳,获得10
刚刚
香蕉觅云应助科研通管家采纳,获得10
刚刚
CodeCraft应助科研通管家采纳,获得30
刚刚
华仔应助科研通管家采纳,获得10
刚刚
耍酷鼠标完成签到 ,获得积分0
1秒前
1秒前
1秒前
1秒前
艾科研发布了新的文献求助10
2秒前
3秒前
搁浅完成签到,获得积分10
3秒前
慕青应助xinxin采纳,获得30
3秒前
hiiamwu完成签到 ,获得积分10
3秒前
科研通AI5应助沐沐采纳,获得10
3秒前
科研通AI2S应助xieyuanxing采纳,获得10
3秒前
今后应助高兴冬灵采纳,获得10
3秒前
zho发布了新的文献求助10
4秒前
kathy发布了新的文献求助10
5秒前
尊敬的半梅完成签到 ,获得积分10
6秒前
6秒前
戴丝发布了新的文献求助10
6秒前
弥谷发布了新的文献求助10
6秒前
6秒前
6秒前
酷波er应助有志青年采纳,获得10
7秒前
wo发布了新的文献求助30
7秒前
8秒前
Shirley发布了新的文献求助10
8秒前
科目三应助高大的未来采纳,获得10
9秒前
10秒前
Islay50ppm完成签到 ,获得积分10
11秒前
科研通AI5应助王雨辰采纳,获得10
11秒前
12秒前
小熊发布了新的文献求助10
12秒前
奉天BB机发布了新的文献求助10
12秒前
13秒前
十三香傻瓜完成签到,获得积分10
13秒前
辛勤千筹发布了新的文献求助10
13秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Conference Record, IAS Annual Meeting 1977 820
England and the Discovery of America, 1481-1620 600
Teaching language in context (Third edition) by Derewianka, Beverly; Jones, Pauline 550
Oligomycin, a new antifungal antibiotic 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3583640
求助须知:如何正确求助?哪些是违规求助? 3152886
关于积分的说明 9494504
捐赠科研通 2855533
什么是DOI,文献DOI怎么找? 1569583
邀请新用户注册赠送积分活动 735428
科研通“疑难数据库(出版商)”最低求助积分说明 721228