Jesse Zhang,Airol A Ubas,Richard de Borja,Valentine Svensson,Nicole Thomas,Neha Thakar,Ian Lai,Aidan Winters,Umair Khan,Matthew G. Jones,Vuong Tran,Joseph Pangallo,Efthymia Papalexi,Ajay Sapre,Hoai Thi Nguyen,Oliver Sanderson,Maria Nigos,Oren Kaplan,Sarah M. Schroeder,Bryan Hariadi
标识
DOI:10.1101/2025.02.20.639398
摘要
Building predictive models of the cell requires systematically mapping how perturbations reshape each cell's state, function, and behavior. Here, we present Tahoe-100M, a giga-scale single-cell atlas of 100 million transcriptomic profiles measuring how each of 1,100 small-molecule perturbations impact cells across 50 cancer cell lines. Our high-throughput Mosaic platform, composed of a highly diverse and optimally balanced 'cell village', reduces batch effects and enables parallel profiling of thousands of conditions at single-cell resolution at an unprecedented scale. As the largest single-cell dataset to date, Tahoe-100M enables artificial-intelligence (AI)-driven models to learn context-dependent functions, capturing fundamental principles of gene regulation and network dynamics. Although we leverage cancer models and pharmacological compounds to create this resource, Tahoe-100M is fundamentally designed as a broadly applicable perturbation atlas and supports deeper insights into cell biology across multiple tissues and contexts. By publicly releasing this atlas, we aim to accelerate the creation and development of robust AI frameworks for systems biology, ultimately improving our ability to predict and manipulate cellular behaviors across a wide range of applications.