Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples

反褶积计算生物学仿形（计算机编程）人口基因表达谱计算机科学潜变量生物生物信息学数据挖掘生物系统统计数学算法基因表达人工智能医学基因遗传学环境卫生操作系统

作者

Ting Gong,Nicole Hartmann,Isaac S. Kohane,Volker Brinkmann,Frank Staedtler,Martin Letzkus,Sandrine Bongiovanni,Joseph D. Szustakowski

出处

期刊：PLOS ONE [Public Library of Science]
日期：2011-11-16 卷期号：6 (11): e27156-e27156 被引量：151

链接

plos.org plos.org doaj.org europepmc.org europepmc.org nih.gov nih.govdoi.org

标识

DOI：10.1371/journal.pone.0027156

摘要

Large-scale molecular profiling technologies have assisted the identification of disease biomarkers and facilitated the basic understanding of cellular processes. However, samples collected from human subjects in clinical trials possess a level of complexity, arising from multiple cell types, that can obfuscate the analysis of data derived from them. Failure to identify, quantify, and incorporate sources of heterogeneity into an analysis can have widespread and detrimental effects on subsequent statistical studies. We describe an approach that builds upon a linear latent variable model, in which expression levels from mixed cell populations are modeled as the weighted average of expression from different cell types. We solve these equations using quadratic programming, which efficiently identifies the globally optimal solution while preserving non-negativity of the fraction of the cells. We applied our method to various existing platforms to estimate proportions of different pure cell or tissue types and gene expression profilings of distinct phenotypes, with a focus on complex samples collected in clinical trials. We tested our methods on several well controlled benchmark data sets with known mixing fractions of pure cell or tissue types and mRNA expression profiling data from samples collected in a clinical trial. Accurate agreement between predicted and actual mixing fractions was observed. In addition, our method was able to predict mixing fractions for more than ten species of circulating cells and to provide accurate estimates for relatively rare cell types (<10% total population). Furthermore, accurate changes in leukocyte trafficking associated with Fingolomid (FTY720) treatment were identified that were consistent with previous results generated by both cell counts and flow cytometry. These data suggest that our method can solve one of the open questions regarding the analysis of complex transcriptional data: namely, how to identify the optimal mixing fractions in a given experiment.

求助该文献

Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples

今日热心研友