scGenePT: Is language all you need for modeling single-cell perturbations?

计算机科学
作者
Ana-Maria Istrate,Donghui Li,Theofanis Karaletsos
标识
DOI:10.1101/2024.10.23.619972
摘要

Abstract Modeling single-cell perturbations is a crucial task in the field of single-cell biology. Predicting the effect of up or down gene regulation or drug treatment on the gene expression profile of a cell can open avenues in understanding biological mechanisms and potentially treating disease. Most foundation models for single-cell biology learn from scRNA-seq counts, using experimental data as a modality to generate gene representations. Similarly, the scientific literature holds a plethora of information that can be used in generating gene representations using a different modality - language - as the basis. In this work, we study the effect of using both language and experimental data in modeling genes for perturbation prediction. We show that textual representations of genes provide additive and complementary value to gene representations learned from experimental data alone in predicting perturbation outcomes for single-cell data. We find that textual representations alone are not as powerful as biologically learned gene representations, but can serve as useful prior information. We show that different types of scientific knowledge represented as language induce different types of prior knowledge. For example, in the datasets we study, subcellular location helps the most for predicting the effect of single-gene perturbations, and protein information helps the most for modeling perturbation effects of interactions of combinations of genes. We validate our findings by extending the popular scGPT model, a foundation model trained on scRNA-seq counts, to incorporate language embeddings at the gene level. We start with NCBI gene card and UniProt protein summaries from the genePT approach and add gene function annotations from the Gene Ontology (GO). We name our model “scGenePT”, representing the combination of ideas from these two models. Our work sheds light on the value of integrating multiple sources of knowledge in modeling single-cell data, highlighting the effect of language in enhancing biological representations learned from experimental data.

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
2秒前
爱学习的小趴菜完成签到,获得积分10
2秒前
kitty完成签到 ,获得积分10
4秒前
4秒前
FashionBoy应助大大怪采纳,获得10
7秒前
Foundpeter发布了新的文献求助30
8秒前
8秒前
爆米花应助du采纳,获得10
8秒前
共享精神应助du采纳,获得10
8秒前
小马甲应助du采纳,获得10
8秒前
汉堡包应助du采纳,获得10
8秒前
乐乐应助du采纳,获得10
8秒前
李健的小迷弟应助du采纳,获得10
8秒前
9秒前
独享属于自己的风完成签到,获得积分10
11秒前
叮叮完成签到 ,获得积分10
11秒前
12秒前
13秒前
调研昵称发布了新的文献求助10
14秒前
15秒前
ljw发布了新的文献求助10
15秒前
文献缺缺应助强健的亦巧采纳,获得10
15秒前
研友_85YNe8发布了新的文献求助10
16秒前
leiyang49发布了新的文献求助10
16秒前
激动的访文完成签到,获得积分10
18秒前
19秒前
Yuu发布了新的文献求助10
19秒前
思源应助Miku采纳,获得10
20秒前
21秒前
研友_VZG7GZ应助Foundpeter采纳,获得10
22秒前
司徒文青应助cjuntao采纳,获得30
22秒前
謃河鷺起完成签到,获得积分10
23秒前
Orange应助橘络采纳,获得10
23秒前
annabel完成签到 ,获得积分10
24秒前
25秒前
25秒前
SciGPT应助ebby采纳,获得10
25秒前
点点zzz发布了新的文献求助30
26秒前
科研通AI2S应助muyouwifi采纳,获得10
27秒前
青蓝完成签到 ,获得积分10
27秒前
高分求助中
Востребованный временем 2500
The Three Stars Each: The Astrolabes and Related Texts 1500
Classics in Total Synthesis IV: New Targets, Strategies, Methods 1000
Les Mantodea de Guyane 800
Mantids of the euro-mediterranean area 700
The Oxford Handbook of Educational Psychology 600
有EBL数据库的大佬进 Matrix Mathematics 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 内科学 物理 纳米技术 计算机科学 遗传学 化学工程 基因 复合材料 免疫学 物理化学 细胞生物学 催化作用 病理
热门帖子
关注 科研通微信公众号,转发送积分 3412462
求助须知:如何正确求助?哪些是违规求助? 3015168
关于积分的说明 8868829
捐赠科研通 2702831
什么是DOI,文献DOI怎么找? 1481897
科研通“疑难数据库(出版商)”最低求助积分说明 685084
邀请新用户注册赠送积分活动 679733