代谢组学
数据库规范化
规范化(社会学)
计算机科学
数据挖掘
比例(比率)
回归分析
计算生物学
模式识别(心理学)
人工智能
回归
生物
数学
统计
生物信息学
机器学习
地理
地图学
社会学
人类学
作者
Xin Shen,Xun Gong,Yuping Cai,Yuan Guo,Jia Tu,Hao Li,Tao Zhang,Jialin Wang,Fei Xue,Zheng-Jiang Zhu
出处
期刊:Metabolomics
[Springer Nature]
日期:2016-03-26
卷期号:12 (5)
被引量:116
标识
DOI:10.1007/s11306-016-1026-5
摘要
Untargeted metabolomics studies for biomarker discovery often have hundreds to thousands of human samples. Data acquisition of large-scale samples has to be divided into several batches and may span from months to as long as several years. The signal drift of metabolites during data acquisition (intra- and inter-batch) is unavoidable and is a major confounding factor for large-scale metabolomics studies. We aim to develop a data normalization method to reduce unwanted variations and integrate multiple batches in large-scale metabolomics studies prior to statistical analyses. We developed a machine learning algorithm-based method, support vector regression (SVR), for large-scale metabolomics data normalization and integration. An R package named MetNormalizer was developed and provided for data processing using SVR normalization. After SVR normalization, the portion of metabolite ion peaks with relative standard deviations (RSDs) less than 30 % increased to more than 90 % of the total peaks, which is much better than other common normalization methods. The reduction of unwanted analytical variations helps to improve the performance of multivariate statistical analyses, both unsupervised and supervised, in terms of classification and prediction accuracy so that subtle metabolic changes in epidemiological studies can be detected. SVR normalization can effectively remove the unwanted intra- and inter-batch variations, and is much better than other common normalization methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI