Sequential Optimal Experimental Design of Perturbation Screens Guided by Multi-modal Priors

计算机科学先验概率忠诚摄动（天文学）机器学习人工智能贝叶斯概率电信物理量子力学

作者

Kexin Huang,Romain Lopez,Jan-Christian Hütter,Takamasa Kudo,Antonio Ríos,Aviv Regev

链接

biorxiv.orgdoi.org

标识

DOI：10.1101/2023.12.12.571389

摘要

Abstract Understanding a cell’s expression response to genetic perturbations helps to address important challenges in biology and medicine, including the function of gene circuits, discovery of therapeutic targets and cell reprogramming and engineering. In recent years, Perturb-seq, pooled genetic screens with single cell RNA-seq (scRNA-seq) readouts, has emerged as a common method to collect such data. However, irrespective of technological advances, because combinations of gene perturbations can have unpredictable, non-additive effects, the number of experimental configurations far exceeds experimental capacity, and for certain cases, the number of available cells. While recent machine learning models, trained on existing Perturb-seq data sets, can predict perturbation outcomes with some degree of accuracy, they are currently limited by sub-optimal training set selection and the small number of cell contexts of training data, leading to poor predictions for unexplored parts of perturbation space. As biologists deploy Perturb-seq across diverse biological systems, there is an enormous need for algorithms to guide iterative experiments while exploring the large space of possible perturbations and their combinations. Here, we propose a sequential approach for designing Perturb-seq experiments that uses the model to strategically select the most informative perturbations at each step for subsequent experiments. This enables a significantly more efficient exploration of the perturbation space, while predicting the effect of the rest of the unseen perturbations with high-fidelity. Analysis of a previous large-scale Perturb-seq experiment reveals that our setting is severely restricted by the number of examples and rounds, falling into a non-conventional active learning regime called “active learning on a budget”. Motivated by this insight, we develop I ter P ert , a novel active learning method that exploits rich and multi-modal prior knowledge in order to efficiently guide the selection of subsequent perturbations. Using prior knowledge for this task is novel, and crucial for successful active learning on a budget. We validate I ter P ert using insilico benchmarking of active learning, constructed from a large-scale CRISPRi Perturb-seq data set. We find that I ter P ert outperforms other active learning strategies by reaching comparable accuracy at only a third of the number of perturbations profiled as the next best method. Overall, our results demonstrate the potential of sequentially designing perturbation screens through I ter P ert .

求助该文献

最长约 10秒，即可获得该文献文件

Sequential Optimal Experimental Design of Perturbation Screens Guided by Multi-modal Priors

今日热心研友