计算机科学
鉴定(生物学)
钥匙(锁)
氢
分子
质谱
谱线
质谱法
可靠性(半导体)
数据挖掘
算法
化学
物理
计算机安全
色谱法
植物
量子力学
生物
功率(物理)
有机化学
天文
作者
Yanmin Liu,Xuan Zhang,Wei Zhao,Daming Zhu,Xuefeng Cui
标识
DOI:10.1109/bibm58861.2023.10385903
摘要
Mass spectrometry is a key technology for the identification of small molecules. However, traditional methods that rely on database comparisons have difficulty with newly discovered molecules that are not in the database. Recent advances in deep learning allow for direct analysis of mass spectra, which makes it possible to predict chemical structures without using a database. We have found that the accurate prediction of hydrogen atoms is a major challenge for the prediction of chemical structures, especially since they are not explicitly represented in SMILES. To address this challenge, we introduce MS2SMILES, a novel approach that treats hydrogen atoms as implicitly linked to heavy atoms. This method enables the model to predict both heavy atoms and hydrogen atoms accurately (instead of just focusing on heavy atoms) during the training phase. Additionally, MS2SMILES incorporates the SMILES grammatical rules when predicting chemical structures, increasing the reliability of the generated SMILES representations. We tested MS2SMILES using the GNPS and CASMI 2016 datasets, and it achieved SMILES prediction accuracies of 53.6% and 63.8%, respectively. These results demonstrate a significant improvement of 19.9% and 10.9% compared to the current leading method.
科研通智能强力驱动
Strongly Powered by AbleSci AI