卷积神经网络
均方误差
人工智能
数量结构-活动关系
学习迁移
适用范围
模式识别(心理学)
稳健性(进化)
分子描述符
计算机科学
人工神经网络
化学
生物系统
机器学习
数学
统计
基因
生物
生物化学
作者
Shifa Zhong,Jiajie Hu,Xiong Yu,Huichun Zhang
标识
DOI:10.1016/j.cej.2020.127998
摘要
In this study, we used molecular images as a representation for organic compounds and combined them with a convolutional neural network (CNN) to develop quantitative structure-activity relationships (QSARs) for predicting compound rate constants toward OH radicals. We applied transfer learning and data augmentation to train molecular image-CNN models and the Gradient-weighted Class Activation Mapping (Grad-CAM) method to interpret them. Results showed that data augmentation and transfer learning can effectively enhance the robustness and predictive performance of the models, with the root-mean-square-error (RMSE) values on the test dataset (RMSEtest) decreasing from (0.395–0.45) to (0.284–0.339) after applying data augmentation, and the RMSE on the training dataset (RMSEtrain) decreasing from (0.452–0.592) to (0.123–0.151) after applying transfer learning. The obtained molecular image-CNN models showed comparative predictive performance (RMSEtest 0.284–0.339) with the molecular fingerprint-based models (RMSEtest 0.30–0.35). Grad-CAM interpretation showed that the molecular image-CNN models correctly chose the molecular features in the images and identified key functional groups that influenced the reactivity. The applicability domain analysis showed that the molecular image-CNN models have a broader applicability domain than molecular fingerprints-based models and the reactivity of any new compounds with a maximum similarity of over 0.85 to the compounds in the training dataset can be reliably predicted. This study demonstrated that molecular image-CNN is a new tool to develop QSARs for environmental applications and can be used to build trustful models that make meaningful predictions.
科研通智能强力驱动
Strongly Powered by AbleSci AI