背景(考古学)
计算机科学
领域(数学分析)
数据挖掘
集合(抽象数据类型)
公共领域
预测建模
生化工程
机器学习
工程类
数学
哲学
古生物学
数学分析
神学
程序设计语言
生物
作者
Filippo Lunghini,Gilles Marcou,Philippe Gantzer,Philippe Azam,Dragos Horvath,Erik Van Miert,Alexandre Varnek
标识
DOI:10.1080/1062936x.2019.1697360
摘要
The European Registration, Evaluation, Authorization and Restriction of Chemical Substances Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB), considering in silico prediction as valid alternative to experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues. In this work, we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (Balance Accuracy (BA) = 0.74–0.79) and data coverage (83–91%). The Generative Topographic Mapping approach identified several chemotypes and structural motifs unique to the industrial dataset, highlighting for which chemical classes currently available models may have less reliable predictions. Finally, public and industrial data were merged into global dataset containing 3146 compounds. This is the biggest dataset reported in the literature so far, covering some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has larger applicability domain than the existing ones.
科研通智能强力驱动
Strongly Powered by AbleSci AI