超参数
管道(软件)
人工神经网络
试验装置
人工智能
核(代数)
计算机科学
回归
机器学习
克里金
密度泛函理论
算法
化学
数学
计算化学
统计
离散数学
程序设计语言
作者
Robin Lawler,Yao-Hao Liu,Nessa Majaya,Omar Allam,Hyunchul Ju,Jin Young Kim,Seung Soon Jang
标识
DOI:10.1021/acs.jpca.1c05031
摘要
In this study, we propose a novel method of pKa prediction in a diverse set of acids, which combines density functional theory (DFT) method with machine learning (ML) methods. First, the DFT method with B3LYP/6-31++G**/SM8 is used to predict pKa, yielding a mean absolute error of 1.85 pKa units. Subsequently, such pKa values predicted from the DFT method are employed as one of 10 molecular descriptors for developing ML models trained on experimental data. Kernel Ridge Regression (KRR), Gaussian Process Regression, and Artificial Neural Network are optimized using three Pipelines: Pipeline 1 involving only hyperparameter optimization (HPO), Pipeline 2 involving HPO followed by a relative contribution analysis (RCA) and recursive feature elimination (RFE), and Pipeline 3 involving HPO followed by RCA and RFE on an expanded set of composite features. Finally, it is demonstrated that KRR with Pipeline 3 yields optimal pKa prediction at an MAE of 0.60 log units. This algorithm was then utilized to predict the pKa of 37 novel acids. The two most important features were determined to be the number of hydrogen atoms in the molecule and the degree of oxidation of the acid. The predicted pKa values were documented for future reference.
科研通智能强力驱动
Strongly Powered by AbleSci AI