作者
Lang Lei,Ruirui Pang,Zhibang Han,Dong Wu,Bing Xie,Yinglong Su
摘要
AbstractAbstractWith the continuous release into environments, emerging contaminants (ECs) have attracted widespread attention for the potential risks, and numerous studies have been conducted on their identification, environmental behavior bioeffects, and removal. Owing to the superiority of dealing with high-dimensional and unstructured data, a new data-driven approach, machine learning (ML), has been gradually applied in the research of ECs. This review described the fundamental principle, algorithms, and workflow of ML, and summarized advances of ML applications for typical ECs (per- and polyfluoroalkyl substances, nanoparticles, antibiotic resistance genes, endocrine-disrupting chemicals, microplastics, antibiotics, and pharmaceutical and personal care products). ML methods showed practicability, reliability, and effectiveness in predicting or analyzing the occurrence, distribution, bioeffects, and removal of ECs, and various algorithms and derived models were developed and optimized to obtain better performance. Moreover, the size and homogeneity of the data set strongly influence the application of ML, and choosing the appropriate ML models with different characteristics is crucial for addressing specific problems related to the data sets. Future efforts should focus on improving the quality of data set and adopting more advanced algorithms, developing the potential of quantitative structure-activity relationship, and promoting the applicability domains and interpretability of models. In addition, the development of codeless ML tools will benefit the accessibility of ML models.Graphical AbstractKeywords: Bioeffectsemerging contaminantsenvironmental behavioridentificationmachine learningremoval technologiesHandling Editors: Frederic Coulon and Lena Q. Ma Additional informationFundingThis work was financially supported by the Natural Science Foundation of Shanghai (22ZR1420700), Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste (19DZ2254400), Open Research Fund of State Key Laboratory of Estuarine and Coastal Research (SKLEC-KF202011), Natural Science Foundation Project of CQ (CSTC2021JCYJ-MSXMX0726), and Fundamental Research Funds for the Central Universities.