缺少数据
插补(统计学)
标杆管理
计算机科学
定量蛋白质组学
数据挖掘
统计能力
统计
蛋白质组学
机器学习
数学
化学
生物化学
营销
业务
基因
作者
Lincoln Harris,William E. Fondrie,Sewoong Oh,William Stafford Noble
标识
DOI:10.1021/acs.jproteome.3c00205
摘要
Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.
科研通智能强力驱动
Strongly Powered by AbleSci AI