Ensemble hologram quantitative structure activity relationship model of the chromatographic retention index of aldehydes and ketones

数量结构-活动关系 化学 分子描述符 过度拟合 生物系统 人工智能 计算机科学 人工神经网络 立体化学 生物
作者
Bin Lei,Yunlei Zang,Zhiwei Xue,Yiqing Ge,Wei Li,Qian Zhai,Long Jiao
出处
期刊:Sepu [Science Press]
卷期号:39 (3): 331-337 被引量:2
标识
DOI:10.3724/sp.j.1123.2020.06011
摘要

Chromatographic retention index (RI) is an important parameter for describing the retention behavior of substances in chromatographic analysis. Experimentally determining the RI values of different aldehyde and ketone compounds in all kinds of polar stationary phases is expensive and time consuming. Quantitative structure activity relationship (QSAR) is an important chemometric technique that has been widely used to correlate the properties of chemicals to their molecular structures. Irrespective of whether the properties of a molecule have been experimentally determined, they can be calculated using QSAR models. It is therefore necessary and advisable to establish the QSAR model for predicting the RI value of aldehydes and ketones. Hologram QSAR (HQSAR) is a highly efficient QSAR approach that can easily generate QSAR models with good statistics and high prediction accuracy. A specific fragment of fingerprint, known as a molecular hologram, is proposed in the HQSAR approach and used as a structural descriptor to build the proposed QSAR model. In general, individual HQSAR models are built in QSAR researches. However, individual QSAR models are usually affected by underfitting and overfitting. The ensemble modeling method, which integrate several individual models through certain consensus strategies, can overcome the shortcomings of individual models. It is worth studying whether ensemble modeling can improve the prediction ability of the HQSAR method in order to build more accurate and reliable QSAR models.Therefore, this study investigates the QSAR model for chromatographic RI of aldehydes and ketones using ensemble modeling and the HQSAR method. Two individual HQSAR models comprising 34 compounds in two stationary phases, DB-210 and HP-Innowax, were established. The prediction ability of the two established models was assessed by external test set validation and leave-one-out cross validation (LOO-CV). The investigated 34 compounds were randomly assigned into two groups. Group Ⅰ comprised 26 compounds, and Group Ⅱ comprised 8 compounds. In the validation of the external test set, Group Ⅰ was employed to manually optimize the two fragment parameters (fragment distinction (FD) and fragment size (FS)) and build the HQSAR models. Group Ⅱ was used as the test set to assess the predictive performance of the developed models. For the DB-210 stationary phase, the optimal individual HQSAR model was obtained while setting the FD and FS to “donor/acceptor atoms (DA)” and 1-9, respectively. For the HP-Innowax stationary phase, the optimal individual HQSAR model was obtained by setting the FD and FS to “DA” and 4-7 respectively. The squared correlation coefficient of cross validation ( q cv 2 ), concordance correlation coefficient (CCC), squared correlation coefficient of external validation ( q ext 2 ), predictive squared correlation coefficient ( Q F 2 2 and Q F 3 2 ) of the two models for predicting the RI value were 0.935 and 0.909, 0.953 and 0.960, 0.925 and 0.927, 0.922 and 0.918, and 0.931 and 0.927, respectively. The results of the two validations show that there is a quantitative relationship between the molecular structure of these compounds and the RI value, and the HQSAR model is capable of modeling this relationship. Second, the ensemble HQSAR models were established using the four individual HQSAR models with the highest accuracy as the sub-models through arithmetic averaging. The ensemble HQSAR models were validated by external test set validation and LOO-CV. The q cv 2 , CCC, q ext 2 , Q F 2 2 , and Q F 3 2 for predicting the RI values of the DB-210 and HP-Innowax stationary phases were 0.927 and 0.919, 0.956 and 0.979, 0.929 and 0.963, 0.927 and 0.958, and 0.935 and 0.963, respectively. Compared to the individual HQSAR models, the established ensemble HQSAR models show better robustness and accuracy, thus establishing that ensemble modeling is an effective approach. The combination of HQSAR and the ensemble modeling method is a practicable and promising method for studying and predicting the RI values of aldehydes and ketones.

最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
zho关闭了zho文献求助
刚刚
1秒前
1秒前
waaan完成签到 ,获得积分10
2秒前
sun发布了新的文献求助10
4秒前
Hayley发布了新的文献求助10
6秒前
6秒前
7秒前
壮观以山发布了新的文献求助10
8秒前
8秒前
zho关闭了zho文献求助
11秒前
难过千易发布了新的文献求助10
13秒前
桐桐应助科研通管家采纳,获得10
14秒前
Hayley完成签到,获得积分10
14秒前
科研通AI2S应助科研通管家采纳,获得10
14秒前
斯文败类应助科研通管家采纳,获得30
14秒前
天天快乐应助科研通管家采纳,获得10
15秒前
英俊的铭应助科研通管家采纳,获得10
15秒前
CipherSage应助科研通管家采纳,获得10
15秒前
15秒前
yar应助科研通管家采纳,获得10
15秒前
悲凉的笑卉完成签到,获得积分20
16秒前
QSJ完成签到 ,获得积分10
16秒前
Tomjugj应助啊哭采纳,获得10
19秒前
小马甲应助难过千易采纳,获得10
20秒前
Rondab应助无误采纳,获得30
21秒前
21秒前
liguilong完成签到,获得积分20
21秒前
小谢完成签到 ,获得积分10
22秒前
1234567发布了新的文献求助10
23秒前
23秒前
yx_cheng给zzq的求助进行了留言
24秒前
liguilong发布了新的文献求助30
24秒前
津津发布了新的文献求助20
24秒前
26秒前
Tomjugj完成签到,获得积分10
27秒前
深情安青应助自觉的向薇采纳,获得10
27秒前
友好初夏发布了新的文献求助10
28秒前
29秒前
高分求助中
The Mother of All Tableaux: Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 3000
A new approach to the extrapolation of accelerated life test data 1000
Problems of point-blast theory 400
北师大毕业论文 基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 390
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
Robot-supported joining of reinforcement textiles with one-sided sewing heads 320
Novel Preparation of Chitin Nanocrystals by H2SO4 and H3PO4 Hydrolysis Followed by High-Pressure Water Jet Treatments 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3998986
求助须知:如何正确求助?哪些是违规求助? 3538486
关于积分的说明 11274314
捐赠科研通 3277378
什么是DOI,文献DOI怎么找? 1807541
邀请新用户注册赠送积分活动 883909
科研通“疑难数据库(出版商)”最低求助积分说明 810080