计算机科学
选择(遗传算法)
稳健性(进化)
DNA测序
泊松分布
随机性
数据挖掘
计算生物学
算法
DNA
生物
遗传学
数学
机器学习
统计
基因
作者
Letian Kuai,Thomas O’Keeffe,Christopher C. Arico-Muendel
标识
DOI:10.1177/2472555218757718
摘要
DNA Encoded Libraries (DELs) use unique DNA sequences to tag each chemical warhead within a library mixture to enable deconvolution following affinity selection against a target protein. With next-generation sequencing, millions to billions of sequences can be read and counted to report binding events. This unprecedented capability has enabled researchers to synthesize and analyze numerically large chemical libraries. Despite the common perception that each library member undergoes a miniaturized affinity assay, selections with higher complexity libraries often produce results that are difficult to rank order. In this study, we aimed to understand the robustness of DEL selection by examining the sequencing readouts of warheads and chemotype families among a large number of experimentally repeated selections. The results revealed that (1) the output of DEL selection is intrinsically noisy but can be reliably modeled by the Poisson distribution, and (2) Poisson noise is the dominating noise at low copy counts and can be estimated even from a single experiment. We also discuss the shortcomings of data analyses based on directly using copy counts and their linear transformations, and propose a framework that incorporates proper normalization and confidence interval calculation to help researchers better understand DEL data.
科研通智能强力驱动
Strongly Powered by AbleSci AI