人口
计算机科学
单细胞测序
核糖核酸
计算生物学
RNA序列
k-最近邻算法
人工智能
匹配(统计)
数据挖掘
生物
数学
基因
基因表达
遗传学
表型
统计
转录组
外显子组测序
人口学
社会学
作者
Laleh Haghverdi,Aaron T. L. Lun,Michael D. Morgan,John C. Marioni
摘要
Differences in gene expression between individual cells of the same type are measured across batches and used to correct technical artifacts in single-cell RNA-sequencing data. Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
科研通智能强力驱动
Strongly Powered by AbleSci AI