Machine Learning-Based SERS Chemical Space for Two-Way Prediction of Structures and Spectra of Untrained Molecules

化学化学空间分子空格（标点符号）谱线计算化学有机化学生物化学量子力学物理语言学药物发现哲学

作者

Jaslyn Ru Ting Chen,Emily Xi Tan,Jingxiang Tang,Shi Xuan Leong,Sean Kai Xun Hue,Chi Seng Pun,In Yee Phang,Xing Yi Ling

出处

期刊：Journal of the American Chemical Society [American Chemical Society]
日期：2025-02-14 卷期号：147 (8): 6654-6664 被引量：8

链接

nih.govdoi.org

标识

DOI：10.1021/jacs.4c15804

摘要

Identifying unknown molecules beyond existing databases remains challenging in surface-enhanced Raman scattering (SERS) spectroscopy. Conventional SERS analysis relies on matching experimental and cataloged spectra, limiting identification to known molecules in databases. With a vast chemical space of >10⁶⁰ molecules, it is impractical to obtain the spectra of every molecule and rely solely on in silico techniques for spectral predictions. Here, we showcase an ML-based SERS chemical space that leverages key spectra-structure correlations to achieve two-way spectra-to-structure and structure-to-spectra predictions for untrained molecules with a >90% average accuracy. Using a SERS chemical space comprising 38 linear molecules from four classes (alcohols, aldehydes, amines, and carboxylic acids), our experimental and in silico studies reveal underlying spectral features that enable the prediction of untrained molecules represented by two molecular descriptors (functional group and carbon chain length). For forward spectra-to-structure predictions, we devise a two-step "classification and regression" ML framework to sequentially predict the functional group and carbon chain length of untrained molecules with 100% accuracy and ≤1 carbon difference, respectively. In addition, using an eXtreme Gradient Boosting (XGBoost) regressor trained on the two molecular descriptors, we attain inverse structure-to-spectra prediction with a high average cosine similarity of 90.4% between the predicted and experimental spectra. Our ML-based SERS chemical space represents a shift in molecular identification from traditional spectral matching to predictive modeling of spectra-structure relationships. These insights could motivate the expansion of SERS chemical spaces and realize demands for present and future SERS technologiesfor accurate unknown identification across diverse fields.

求助该文献

最长约 10秒，即可获得该文献文件

Machine Learning-Based SERS Chemical Space for Two-Way Prediction of Structures and Spectra of Untrained Molecules

今日热心研友