代谢组学
代谢物
计算机科学
串联质谱法
计算生物学
质谱
下部结构
生物系统
化学
人工智能
模式识别(心理学)
质谱法
生物信息学
生物
生物化学
色谱法
结构工程
工程类
作者
Samuel Goldman,Jeremy Wohlwend,Martin Stražar,Guy Haroush,Ramnik J. Xavier,Connor W. Coley
标识
DOI:10.1038/s42256-023-00708-3
摘要
Metabolomics studies have identified small molecules that mediate cell signaling, competition and disease pathology, in part due to large-scale community efforts to measure tandem mass spectra for thousands of metabolite standards. Nevertheless, the majority of spectra observed in clinical samples cannot be unambiguously matched to known structures. Deep learning approaches to small-molecule structure elucidation have surprisingly failed to rival classical statistical methods, which we hypothesize is due to the lack of in-domain knowledge incorporated into current neural network architectures. Here we introduce a neural network-driven workflow for untargeted metabolomics, Metabolite Inference with Spectrum Transformers (MIST), to annotate tandem mass spectra peaks with chemical structures. Unlike existing approaches, MIST incorporates domain insights into its architecture by encoding peaks with their chemical formula representations, implicitly featurizing pairwise neutral losses and training the network to additionally predict substructure fragments. MIST performs favorably compared with both standard neural architectures and the state-of-the-art kernel method on the task of fingerprint prediction for over 70% of metabolite standards and retrieves 66% of metabolites with equal or improved accuracy, with 29% strictly better. We further demonstrate the utility of MIST by suggesting potential dipeptide and alkaloid structures for differentially abundant spectra found in an inflammatory bowel disease patient cohort. Tandem mass spectroscopy is a useful tool to identify metabolites but is limited by the capability of computational methods to annotate peaks with chemical structures when spectra are dissimilar to previously observed spectra. Goldman and colleagues use a transformer-based method to annotate chemical structure fragments, thereby incorporating domain insights into its architecture, and to simultaneously predict the structure of the metabolite and its fragments from the spectrum.
科研通智能强力驱动
Strongly Powered by AbleSci AI