单变量
化学
假阳性悖论
单变量分析
代谢组学
色谱法
多元统计
多元分析
统计
数学
作者
Suyun Xu,Caihong Bai,Yanli Chen,Lingling Yu,Wenjun Wu,Kaifeng Hu
标识
DOI:10.1016/j.aca.2023.342103
摘要
PLS-DA of high-dimensional metabolomics data is frequently employed to capture the most pertinent features to sample classification. But the presence of numerous insignificant input features could distort the PLS-DA model, blow up and scramble the selected differential features. Usually, univariate filtration is subsequently complemented to refine the selected features, but often giving unstable results. Whereas by precluding insignificant features through univariate data prefiltration assessed by FDR adjusted p-value, PLS-DA can generate more stable and reliable differential features. We explored and compared these two data analysis procedures to gain insights into the underlying mechanisms responsible for the disparate results. The effect of univariate data filtration preceding and succeeding PLS-DA analysis on the identified discriminative features/metabolites was investigated using LC-MS data acquired on the samples of human serum and C. elegans extracts, with and without metabolite standards spiked to simulate the treated and control groups of biological samples. It was shown that the univariate data prefiltration before PLS-DA usually gave less but more stable and likely more reliable and meaningful differential features, while PLS-DA applied directly to the original data could be affected by the presence of insignificant features and orthogonal noise. Large number of insignificant variables and orthogonal noise could distort the generated PLS-DA model and affect the p(corr) value, and artificially inflate the calculated VIP values of relevant features due to the increased total number of input features for model construction, thus leading to more false positives selected by the conventional VIP threshold of 1.0. Univariate data filtration preceding PLS-DA was important for the identification of reliable differential features if using a conventional threshold of VIP of 1.0. Presence of insignificant features could distort the PLS-DA model and inflate VIP values. Appropriate VIP threshold is associated with the numbers of input features and the model components. For PLS-DA without univariate prefiltration, threshold of VIP larger than 1.0 is recommended for the selection of discriminative features to reduce the false positives.
科研通智能强力驱动
Strongly Powered by AbleSci AI