Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics

化学随机森林亲水作用色谱法人工智能试验装置工作流程保留时间过度拟合机器学习色谱法代谢组学朴素贝叶斯分类器计算机科学人工神经网络注释高效液相色谱法代谢组数据库支持向量机

作者

Paolo Bonini,Tobias Kind,Hiroshi Tsugawa,Dinesh Kumar Barupal,Oliver Fiehn

出处

期刊：Analytical Chemistry [American Chemical Society]
日期：2020-05-11 卷期号：92 (11): 7515-7522 被引量：161

链接

nih.gov escholarship.org escholarship.org nih.govdoi.org

标识

DOI：10.1021/acs.analchem.9b05765

摘要

Unidentified peaks remain a major problem in untargeted metabolomics by LC-MS/MS. Confidence in peak annotations increases by combining MS/MS matching and retention time. We here show how retention times can be predicted from molecular structures. Two large, publicly available data sets were used for model training in machine learning: the Fiehn hydrophilic interaction liquid chromatography data set (HILIC) of 981 primary metabolites and biogenic amines,and the RIKEN plant specialized metabolome annotation (PlaSMA) database of 852 secondary metabolites that uses reversed-phase liquid chromatography (RPLC). Five different machine learning algorithms have been integrated into the Retip R package: the random forest, Bayesian-regularized neural network, XGBoost, light gradient-boosting machine (LightGBM), and Keras algorithms for building the retention time prediction models. A complete workflow for retention time prediction was developed in R. It can be freely downloaded from the GitHub repository (https://www.retip.app). Keras outperformed other machine learning algorithms in the test set with minimum overfitting, verified by small error differences between training, test, and validation sets. Keras yielded a mean absolute error of 0.78 min for HILIC and 0.57 min for RPLC. Retip is integrated into the mass spectrometry software tools MS-DIAL and MS-FINDER, allowing a complete compound annotation workflow. In a test application on mouse blood plasma samples, we found a 68% reduction in the number of candidate structures when searching all isomers in MS-FINDER compound identification software. Retention time prediction increases the identification rate in liquid chromatography and subsequently leads to an improved biological interpretation of metabolomics data.

求助该文献

最长约 10秒，即可获得该文献文件

Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics

今日热心研友