数量结构-活动关系
适用范围
稳健性(进化)
生物系统
相似性(几何)
灵敏度(控制系统)
计算机科学
化学
人工智能
机器学习
工程类
生物化学
电子工程
生物
图像(数学)
基因
作者
Shifa Zhong,Xiaohong Guan
出处
期刊:Environmental Science and Technology Letters
[American Chemical Society]
日期:2023-09-20
卷期号:10 (10): 872-877
被引量:5
标识
DOI:10.1021/acs.estlett.3c00599
摘要
In this study, we developed quantitative structure–activity relationship (QSAR) models for water contaminants' activities/properties by fine-tuning GPT-3 models. We also proposed a novel masked atom importance (MAI) approach for model interpretation and an OpenAIEmbedding similarity-based method for determining the applicability domain. We utilized the Simplified Molecular-Input Line-Entry System (SMILES) of contaminants and their corresponding activities/properties from hree data sets: pKd, Koc, and Solubility. These were used as input prompts and completions, respectively, to fine-tune four GPT-3 models (Davinci, Curie, Babbage, and Ada) obtained from OpenAI. The Babbage model demonstrated superior performance for the pKd data set, while the Davinci model excelled with the Koc and Solubility data sets, even outperforming molecular fingerprint (MF) CatBoost-based QSAR models. The MAI interpretation results were qualitatively consistent with the SHapley additive expansion (SHAP) interpretation but exhibited less sensitivity in quantitative analysis. The OpenAIEmbedding similarity-based applicability domain determination approach showed efficacy comparable to that of the MF-based similarity approach but with added robustness. This study underscores the potential of large language models in developing QSAR models, paving the way for further advancements in QSAR modeling using state-of-the-art language models.
科研通智能强力驱动
Strongly Powered by AbleSci AI