亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

scGenePT: Is language all you need for modeling single-cell perturbations?

计算机科学
作者
Ana-Maria Istrate,Donghui Li,Theofanis Karaletsos
标识
DOI:10.1101/2024.10.23.619972
摘要

Abstract Modeling single-cell perturbations is a crucial task in the field of single-cell biology. Predicting the effect of up or down gene regulation or drug treatment on the gene expression profile of a cell can open avenues in understanding biological mechanisms and potentially treating disease. Most foundation models for single-cell biology learn from scRNA-seq counts, using experimental data as a modality to generate gene representations. Similarly, the scientific literature holds a plethora of information that can be used in generating gene representations using a different modality - language - as the basis. In this work, we study the effect of using both language and experimental data in modeling genes for perturbation prediction. We show that textual representations of genes provide additive and complementary value to gene representations learned from experimental data alone in predicting perturbation outcomes for single-cell data. We find that textual representations alone are not as powerful as biologically learned gene representations, but can serve as useful prior information. We show that different types of scientific knowledge represented as language induce different types of prior knowledge. For example, in the datasets we study, subcellular location helps the most for predicting the effect of single-gene perturbations, and protein information helps the most for modeling perturbation effects of interactions of combinations of genes. We validate our findings by extending the popular scGPT model, a foundation model trained on scRNA-seq counts, to incorporate language embeddings at the gene level. We start with NCBI gene card and UniProt protein summaries from the genePT approach and add gene function annotations from the Gene Ontology (GO). We name our model “scGenePT”, representing the combination of ideas from these two models. Our work sheds light on the value of integrating multiple sources of knowledge in modeling single-cell data, highlighting the effect of language in enhancing biological representations learned from experimental data.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
王葆蕾完成签到 ,获得积分10
17秒前
华仔应助林钰浩采纳,获得10
18秒前
爆米花应助王彦霖采纳,获得10
21秒前
天天快乐应助帅气绮露采纳,获得10
23秒前
32秒前
端庄从凝发布了新的文献求助10
35秒前
默笙完成签到 ,获得积分10
38秒前
帅气绮露发布了新的文献求助10
39秒前
40秒前
51秒前
51秒前
53秒前
阿玥发布了新的文献求助10
1分钟前
大个应助科研通管家采纳,获得10
1分钟前
科研通AI2S应助科研通管家采纳,获得10
1分钟前
1分钟前
1分钟前
林钰浩发布了新的文献求助10
1分钟前
开心发布了新的文献求助10
1分钟前
1分钟前
1分钟前
taku完成签到 ,获得积分10
1分钟前
1分钟前
开心完成签到,获得积分20
1分钟前
852应助cccc采纳,获得10
1分钟前
1分钟前
NexusExplorer应助Monica采纳,获得10
2分钟前
2分钟前
2分钟前
2分钟前
cc完成签到,获得积分20
2分钟前
orixero应助耳东采纳,获得10
2分钟前
2分钟前
2分钟前
Darlin完成签到 ,获得积分10
2分钟前
2分钟前
cccc发布了新的文献求助10
2分钟前
Monica发布了新的文献求助10
2分钟前
耳东发布了新的文献求助10
2分钟前
cc发布了新的文献求助30
2分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
The Social Psychology of Citizenship 1000
Eco-Evo-Devo: The Environmental Regulation of Development, Health, and Evolution 900
Signals, Systems, and Signal Processing 510
Discrete-Time Signals and Systems 510
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 500
THC vs. the Best: Benchmarking Turmeric's Powerhouse against Leading Cosmetic Actives 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5927064
求助须知:如何正确求助?哪些是违规求助? 6960990
关于积分的说明 15832618
捐赠科研通 5055087
什么是DOI,文献DOI怎么找? 2719653
邀请新用户注册赠送积分活动 1675234
关于科研通互助平台的介绍 1608885