Python(编程语言)
机器学习
人工智能
计算机科学
卷积神经网络
梯度升压
Boosting(机器学习)
深度学习
渲染(计算机图形)
程序设计语言
随机森林
作者
Xin Lei,Haiying Yu,Sisi Liu,Guang‐Guo Ying,Chang-Er Chen
标识
DOI:10.1016/j.scitotenv.2024.171143
摘要
Effectively identifying persistent organic pollutants (POPs) with extensive organic chemical datasets poses a formidable challenge but is of utmost importance. Leveraging machine learning techniques can enhance this process, but previous models often demanded advanced programming skills and high-end computing resources. In this study, we harnessed the simplicity of PyCaret, a Python-based package, to construct machine-learning models for POP screening based on 2D molecular descriptors. We compared the performance of these models against a deep convolutional neural network (DCNN) model. Utilising minimal Python code, we generated several models that exhibited superior or comparable performance to the DCNN. The most outstanding performer, the Light Gradient Boosting Machine (LGBM), achieved an accuracy of 96.20 %, an AUC of 97.70 %, and an F1 score of 82.58 %. This model outshone the DCNN model. Furthermore, it excelled in identifying POPs within the REACH PBT and compiled industrial chemical lists. Our findings highlight the accessibility and simplicity of PyCaret, requiring only a few lines of code, rendering it suitable for non-computing professionals in environmental sciences. The ability of low code machine learning tools (e.g. PyCaret) to facilitate model comparison and interpretation holds promise, encouraging prompt assessment and management of chemical substances.
科研通智能强力驱动
Strongly Powered by AbleSci AI