保留时间
变压器
代表(政治)
色谱法
计算机科学
人工智能
化学
工程类
电气工程
电压
政治
政治学
法学
作者
Sargol Mazraedoost,Hadi Sedigh Malekroodi,Petar Žuvela,Myunggi Yi,J. Jay Liu
标识
DOI:10.1021/acs.jcim.5c00167
摘要
Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and R-squared (R2) value of 0.91. The model's explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model's transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI