插补(统计学)
缺少数据
推论
鸟枪蛋白质组学
计算机科学
数据集
数据挖掘
统计推断
统计
蛋白质组学
人工智能
数学
机器学习
化学
生物化学
基因
作者
Lisa Bramer,Jan Irvahn,Paul Piehowski,Karin Rodland,Bobbie‐Jo Webb‐Robertson
标识
DOI:10.1021/acs.jproteome.0c00123
摘要
The throughput efficiency and increased depth of coverage provided by isobaric-labeled proteomics measurements have led to increased usage of these techniques. However, the structure of missing data is different than unlabeled studies, which prompts the need for this review to compare the efficacy of nine imputation methods on large isobaric-labeled proteomics data sets to guide researchers on the appropriateness of various imputation methods. Imputation methods were evaluated by accuracy, statistical hypothesis test inference, and run time. In general, expectation maximization and random forest imputation methods yielded the best performance, and constant-based methods consistently performed poorly across all data set sizes and percentages of missing values. For data sets with small sample sizes and higher percentages of missing data, results indicate that statistical inference with no imputation may be preferable. On the basis of the findings in this review, there are core imputation methods that perform better for isobaric-labeled proteomics data, but great care and consideration as to whether imputation is the optimal strategy should be given for data sets comprised of a small number of samples.
科研通智能强力驱动
Strongly Powered by AbleSci AI