皮肤刺激
分类器(UML)
机器学习
二元分类
接收机工作特性
计算机科学
生物信息学
二进制数
数据挖掘
人工智能
化学
支持向量机
皮肤病科
数学
医学
生物化学
算术
基因
作者
Zejun Huang,Shang Lou,Haoqiang Wang,Weihua Li,Guixia Liu,Yun Tang
标识
DOI:10.1021/acs.chemrestox.3c00332
摘要
Skin Corrosion/Irritation (Corr./Irrit.) has long been a health hazard in the Globally Harmonized System (GHS). Several in silico models have been built to predict Skin Corr./Irrit. as an alternative to the increasingly restricted animal testing. However, current studies are limited by data amount/quality and model availability. To address these issues, we compiled a traceable consensus GHS data set comprising 731 Corr., 1283 Irrit., and 1205 negative (Neg.) samples from 6 governmental databases and 2 external data sets. Then, a series of binary classifiers were developed with five machine learning (ML) algorithms and six molecular representations. For 10-fold cross-validation, the best Corr. vs Neg. classifier achieved an Area Under the Receiver Operating Characteristic Curve (AUC) of 97.1%, while the best Irrit. vs Neg. classifier achieved an AUC of 84.7%. Compared with existing in silico tools on external validation, our Attentive FP classifiers showed the highest metrics on Corr. vs Neg. and the second highest accuracy on Irrit. vs Neg. The SHapley Additive exPlanation approach was further applied to figure out important molecular features, and the attention weights were visualized to perform interpretable prediction. Structural alerts associated with Skin Corr./Irrit. were also identified. The interpretable Attentive FP classifiers were integrated into the software AttentiveSkin at https://github.com/BeeBeeWong/AttentiveSkin. The conventional ML classifiers are also provided on our platform admetSAR at http://lmmd.ecust.edu.cn/admetsar2/. Considering the data deficiency and the limited model availability of Skin Corr./Irrit., we believe that our data set and models could facilitate chemical safety assessment and relevant studies.
科研通智能强力驱动
Strongly Powered by AbleSci AI