Development of QSAR models for prediction of fish bioconcentration factors using physicochemical properties and molecular descriptors with machine learning algorithms
Bioconcentration factors (BCFs) are indicators of the accumulation of chemical substances in organisms; they play an important role in the environmental risk assessment of various chemical substances. Experiments to obtain BCFs are expensive and time consuming; hence, it is desirable to predictively determine BCF during the early stage of chemical development. In this study, we developed a quantitative structure-activity relationship (QSAR) model using physicochemical properties, environmental fate endpoints, and molecular descriptors. Physicochemical properties and environmental fate endpoints were generated by OPERA, which is a QSAR software. Moreover, we calculated the molecular descriptors using Mordred. A gradient boosting decision tree model was developed as a machine learning model, and multiple linear regression and support vector machine models were developed for comparison. Our developed model showed that the coefficients of determination (R2) of the training and test sets were 0.923 and 0.863, respectively, which are higher than the predictions of the previous model and values calculated by OPERA. The results obtained from the present study suggest that an accurate QSAR model can be developed using the physicochemical properties, environmental fate endpoints, and molecular descriptors calculated from the chemical structure without actually conducting BCF experiments. The model could be one of the choice for the preliminary risk assessment without investing in a large number of BCF experiments during the early development stages of candidate chemicals.