计算机科学
人工智能
蛋白质组学
分类器(UML)
标杆管理
深度学习
质谱法
代谢组学
模式识别(心理学)
数据挖掘
机器学习
化学
色谱法
生物化学
基因
业务
营销
作者
Yi Liu,Yingying Zhang,Yuanjun Zhai,Fuchu He,Ying–Hui Zhu,Cheng Chang
标识
DOI:10.1101/2022.12.24.521877
摘要
Abstract Retention time (RT) alignment is one of the crucial steps in liquid chromatography-mass spectrometry (LC-MS)-based proteomic and metabolomic experiments, especially for large cohort studies, and it can be achieved using computational methods; the most popular methods are the warping function method and the direct matching method. However, the existing tools can hardly handle monotonic and non-monotonic RT shifts simultaneously. To overcome this, we developed a deep learning-based RT alignment tool, DeepRTAlign, for large cohort LC-MS data analysis. It first performs a coarse alignment by calculating the average time shift between any two samples and then uses RT and m/z as the main features to train its deep learning-based model. We demonstrate that DeepRTAlign has improved performances, especially when handling complex samples, by benchmarking it against current state-of-the-art approaches on 19 real-world proteomic and metabolomic datasets and the corresponding simulated datasets. Benchmarked on a dataset with known fold changes, the results showed that DeepRTAlign can improve the identification sensitivity of MS data without compromising the quantitative accuracy. Furthermore, using the MS features aligned by DeepRTAlign in a large cohort, we trained a classifier of 15 features to predict the early recurrence of hepatocellular carcinoma. The features were validated on an independent cohort using targeted proteomics with an AUC of 0.833. Being flexible and robust with four different feature extraction tools, DeepRTAlign provides an advanced solution to RT alignment in large cohort LC-MS data, which is currently one of the bottlenecks in proteomics and metabolomics research, especially for clinical applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI