Defining and Exploring Chemical Spaces

化学空间 计算机科学 操作化 贝叶斯优化 财产(哲学) 过程(计算) 人工智能 机器学习 生化工程 数据科学 药物发现 生物信息学 工程类 哲学 认识论 生物 操作系统
作者
Connor W. Coley
出处
期刊:Trends in chemistry [Elsevier]
卷期号:3 (2): 133-145 被引量:78
标识
DOI:10.1016/j.trechm.2020.11.004
摘要

Virtual libraries used in molecular discovery are often too large to exhaustively evaluate, warranting the use of algorithms to help with exploration.Algorithmic approaches like Bayesian optimization can help to efficiently navigate predefined chemical spaces in combination with surrogate models.On-the-fly molecular generation during exploration enables even larger chemical spaces to be searched, including deep-learning-based models, although their chemical spaces are defined only implicitly.Emerging approaches to incorporate reactions into machine-learning-based generation can ensure that molecules are able to be synthesized, similar to previously developed algorithms for reaction-based de novo design. Designing functional molecules with desirable properties is often a challenging, multi-objective optimization. For decades, there have been computational approaches to facilitate this process through the simulation of physical processes, the prediction of molecular properties using structure–property relationships, and the selection or generation of molecular structures. This review provides an overview of some algorithmic approaches to defining and exploring chemical spaces that have the potential to operationalize the process of molecular discovery. We emphasize the potential roles of machine learning and the consideration of synthetic feasibility, which is a prerequisite to 'closing the loop'. We conclude by summarizing important directions for the future development and evaluation of these methods. Designing functional molecules with desirable properties is often a challenging, multi-objective optimization. For decades, there have been computational approaches to facilitate this process through the simulation of physical processes, the prediction of molecular properties using structure–property relationships, and the selection or generation of molecular structures. This review provides an overview of some algorithmic approaches to defining and exploring chemical spaces that have the potential to operationalize the process of molecular discovery. We emphasize the potential roles of machine learning and the consideration of synthetic feasibility, which is a prerequisite to 'closing the loop'. We conclude by summarizing important directions for the future development and evaluation of these methods. Chemical space can be thought of as the set of all possible molecules or materials. We generally consider more narrowly defined chemical spaces that are defined or constrained by the structures or functions of the molecules they contain. For example, 'drug-like chemical space' is used in the context of drug discovery to reflect the vast number of molecules that have physical properties similar to those of existing small-molecule therapeutics. While quantifying the size of a chemical is rarely useful, it should be noted that there are far more organic molecules thought to be stable than atoms in the solar system, which is unsurprising given the combinatorics of designing molecular graphs. Here, we focus our discussion on small molecules rather than periodic materials, biomolecules, and polymers, all of which correspond to distinct 'chemical spaces'. Many studies have estimated the size of different chemical spaces [1.Bohacek R.S. et al.The art and practice of structure-based drug design: a molecular modeling perspective.Med. Res. Rev. 1996; 16: 3-50Crossref PubMed Scopus (774) Google Scholar, 2.Drew K.L.M. et al.Size estimation of chemical space: how big is it?.J. Pharm. Pharmacol. 2012; 64: 490-495Crossref PubMed Scopus (31) Google Scholar, 3.Polishchuk P.G. et al.Estimation of the size of drug-like chemical space based on GDB-17 data.J. Comput. Aided Mol. Des. 2013; 27: 675-679Crossref PubMed Scopus (201) Google Scholar] and suggested rules to organize these spaces along functional axes to improve their visualization and navigability [4.Oprea T.I. Gottfries J. Chemography: the art of navigating in chemical space.J. Comb. Chem. 2001; 3: 157-166Crossref PubMed Scopus (285) Google Scholar, 5.Reymond J.-L. Awale M. Exploring chemical space for drug discovery using the Chemical Universe database.ACS Chem. Neurosci. 2012; 3: 649-657Crossref PubMed Scopus (173) Google Scholar, 6.Awale M. Reymond J.-L. Web-based 3D-visualization of the DrugBank chemical space.J. Cheminform. 2016; 8: 25Crossref PubMed Scopus (10) Google Scholar, 7.Probst D. Reymond J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees.J. Cheminform. 2020; 12: 12Crossref PubMed Scopus (65) Google Scholar]. As we have described previously, the discovery of novel molecules can be framed as a search within chemical space [8.Coley C.W. et al.Autonomous discovery in the chemical sciences part I: progress.Angew. Chem. Int. Ed. 2019; (Published online September 25, 2019. https://doi.org/10.1002/anie.201909987)Google Scholar,9.Coley C.W. et al.Autonomous discovery in the chemical sciences part II: outlook.Angew. Chem. Int. Ed. 2019; (Published online September 25, 2019. https://doi.org/10.1002/anie.201909989)Google Scholar]. The goal is to identify one or more molecules that exhibit a set of desirable properties. Besides defining these properties and a strategy to evaluate candidate molecules, the two primary considerations one must make are: (i) how to define the space; and (ii) how to explore the space. Both contribute to the search efficiency and likelihood of finding a good candidate. These two aspects are not independent: if you are repurposing FDA-approved drugs, your chemical space is narrow enough that an exhaustive screen may be feasible, but if you have no such restriction you must employ some strategy to select which molecules to test. These strategies are typically iterative optimization routines (driven by human intuition or driven by quantitative experimental design) with varying degrees of sophistication, as discussed later. Navigating chemical space has been extensively written about in the context of (non-algorithmic) drug design [10.Dobson C.M. Chemical space and biology.Nature. 2004; 432: 824-828Crossref PubMed Scopus (717) Google Scholar,11.Lipinski C. Hopkins A. Navigating chemical space for biology and medicine.Nature. 2004; 432: 855-861Crossref PubMed Scopus (769) Google Scholar]. The number of candidate molecules is too large to explore exhaustively, so one often imposes constraints on chemical space depending on the search strategy, the application, and the practical limitations of cost and time. These constraints look quite different when candidates are evaluated by physical rather than computational experiments. In the former case, acquiring new information about the performance of a molecule requires its physical synthesis, purification, and characterization; considerations of synthesis cost and material availability are paramount. In the latter case, one may postpone these practical considerations until after computational evaluations have identified a putative 'optimal' molecule. To bound the computational cost, the search space is still restricted using human expertise or some 'prior' on what would make a viable candidate. This review examines strategies to define and explore chemical spaces with an emphasis on the role of machine learning and synthesizability constraints (Table 1, Key Table). While this can be performed by subject-matter experts (e.g., medicinal chemists) in the absence of computer assistance, formalizing these concepts may eventually enable autonomous workflows to produce novel, useful outcomes with reduced reliance on human intuition and subjectivity. Elements of the concepts we cover can be found in previous articles, including a recent overview by Lemonick [12.Lemonick S. Exploring chemical space: can AI take us where no human has gone before?.Chem. Eng. News. 2020; 98: 30Google Scholar]. We do not address visualization and instead refer readers to the work of Reymond and coworkers [5.Reymond J.-L. Awale M. Exploring chemical space for drug discovery using the Chemical Universe database.ACS Chem. Neurosci. 2012; 3: 649-657Crossref PubMed Scopus (173) Google Scholar,7.Probst D. Reymond J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees.J. Cheminform. 2020; 12: 12Crossref PubMed Scopus (65) Google Scholar].Table 1Key Table. Categorization of Approaches to Define Chemical Spaces for Molecular Discovery and an Incomplete Set of Examples for EachaSpaces can be defined prior to exploration or defined on the fly by evolutionary and/or machine learning-based methods. They can be relatively unconstrained (i.e., only in terms of validity) or constrained by availability (i.e., in terms of purchasability or synthesizability).UnconstrainedConstrainedPredefinedZINC [13.Irwin J.J. et al.ZINC: a free tool to discover chemistry for biology.J. Chem. Inf. Model. 2012; 52: 1757-1768Crossref PubMed Scopus (1646) Google Scholar], ChEMBL [15.Gaulton A. et al.ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Res. 2012; 40: D1100-D1107Crossref PubMed Scopus (2302) Google Scholar], PubChem [14.Kim S. et al.PubChem 2019 update: improved access to chemical data.Nucleic Acids Res. 2019; 47: D1102-D1109Crossref PubMed Scopus (1440) Google Scholar], GDB [24.Reymond J.-L. The Chemical Space Project.Acc. Chem. Res. 2015; 48: 722-730Crossref PubMed Scopus (266) Google Scholar]DrugBank [16.Wishart D.S. et al.DrugBank: a comprehensive resource for in silico drug discovery and exploration.Nucleic Acids Res. 2006; 34: D668-D672Crossref PubMed Scopus (2338) Google Scholar], Enamine REAL (https://enamine.net/library-synthesis/real-compounds), WuXi Virtual Library (https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual), SAVI [32.Patel H. et al.Synthetically Accessible Virtual Inventory (SAVI).ChemRxiv. 2020; (Published online April 27, 2020. https://doi.org/10.26434/chemrxiv.12185559)Google Scholar], PGVL [33.Hu Q. et al.LEAP into the Pfizer Global Virtual Library (PGVL) space: creation of readily synthesizable design ideas automatically.Methods Mol. Biol. 2011; 685: 253-276Crossref PubMed Scopus (29) Google Scholar], PLC [34.Nicolaou C.A. et al.The Proximal Lilly Collection: mapping, exploring and exploiting feasible chemical space.J. Chem. Inf. Model. 2016; 56: 1253-1266Crossref PubMed Scopus (48) Google Scholar]On the fly via heuristic methodsFragment-based GAs [57.Venkatasubramanian V. et al.Computer-aided molecular design using genetic algorithms.Comput. Chem. Eng. 1994; 18: 833-844Crossref Scopus (192) Google Scholar], GroupBuild [66.Rotstein S.H. Murcko M.A. GroupBuild: a fragment-based method for de novo drug design.J. Med. Chem. 1993; 36: 1700-1710Crossref PubMed Scopus (170) Google Scholar], BREED [58.Pierce A.C. et al.BREED: generating novel inhibitors through hybridization of known ligands. Application to CDK2, P38, and HIV protease.J. Med. Chem. 2004; 47: 2768-2775Crossref PubMed Scopus (149) Google Scholar], GraphGA [62.Jensen J.H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space.Chem. Sci. 2019; 10: 3567-3572Crossref PubMed Google Scholar], GEGL [63.Ahn S. et al.Guiding deep molecular optimization with genetic exploration.arXiv. 2020; (Published online July 4, 2020. http://arxiv.org/abs/2007.04897)Google Scholar]SYNOPSIS [91.Vinkers H.M. et al.SYNOPSIS: SYNthesize and OPtimize System in Silico.J. Med. Chem. 2003; 46: 2765-2773Crossref PubMed Scopus (163) Google Scholar], Flux [88.Fechner U. Schneider G. Flux (1): a virtual synthesis scheme for fragment-based de novo design.J. Chem. Inf. Model. 2006; 46: 699-707Crossref PubMed Scopus (83) Google Scholar], MOARF [89.Firth N.C. et al.MOARF, an integrated workflow for multi-objective optimization: implementation, synthesis, and biological evaluation.J. Chem. Inf. Model. 2015; 55: 1169-1180Crossref PubMed Scopus (24) Google Scholar], DOGS [92.Hartenfeller M. et al.DOGS: reaction-driven de novo design of bioactive compounds.PLoS Comput. Biol. 2012; 8e1002380Crossref PubMed Scopus (155) Google Scholar]On the fly via machine learningSMILES VAE [118.Gomez-Bombarelli R. et al.Automatic chemical design using a data-driven continuous representation of molecules.ACS Cent. Sci. 2018; 4: 268-276Crossref PubMed Scopus (1022) Google Scholar], JT-VAE [75.Jin W. et al.Junction tree variational autoencoder for molecular graph generation.arXiv. 2018; (Published online February 12, 2018. https://arxiv.org/abs/1802.04364)Google Scholar], SMILES RNN [72.Segler M.H.S. et al.Generating focused molecule libraries for drug discovery with recurrent neural networks.ACS Cent. Sci. 2018; 4: 120-131Crossref PubMed Scopus (514) Google Scholar,73.Olivecrona M. et al.Molecular de-novo design through deep reinforcement learning.J. Cheminform. 2017; 9: 48Crossref PubMed Scopus (381) Google Scholar], MolDQN [77.Zhou Z. et al.Optimization of molecules via deep reinforcement learning.arXiv. 2018; (Published online October 19, 2018. http://arxiv.org/abs/1810.08678)Google Scholar]MoleculeChef [96.Bradshaw J. et al.A model to search for synthesizable molecules.arXiv. 2019; (Published online June 12, 2019. http://arxiv.org/abs/1906.05221)Google Scholar], ChemBO [97.Korovina K. ChemBO: Bayesian optimization of small organic molecules with synthesizable recommendations.arXiv. 2019; (Published online August 5, 2019. http://arxiv.org/abs/1908.01425)Google Scholar], PGFS [98.Gottipati S.K. et al.Learning to navigate the synthetically accessible chemical space using reinforcement learning.arXiv. 2020; (Published online April 26, 2020. https://arxiv.org/abs/2004.12485v1)Google Scholar], REACTOR [99.Horwood J. Noutahi E. Molecular design in synthetically accessible chemical space via deep reinforcement learning.arXiv. 2020; (Published online April 29, 2020. https://arxiv.org/abs/2004.14308v1)Google Scholar]a Spaces can be defined prior to exploration or defined on the fly by evolutionary and/or machine learning-based methods. They can be relatively unconstrained (i.e., only in terms of validity) or constrained by availability (i.e., in terms of purchasability or synthesizability). Open table in a new tab One approach to molecular discovery is to explore a predefined chemical space: an enumerated list of candidate molecules. In this setting, the two stages of (i) defining the space and (ii) exploring the space are entirely decoupled. Formally, we might think about this problem as an optimization of an objective function f(x), where x is a molecule belonging to a discrete set X. Defining or selecting a finite chemical space often relies on domain expertise. Careful selection of X can increase the likelihood that it contains a high-performing molecule while minimizing the number of low-performing compounds. Common databases of molecules for computational screening are: ZINC [13.Irwin J.J. et al.ZINC: a free tool to discover chemistry for biology.J. Chem. Inf. Model. 2012; 52: 1757-1768Crossref PubMed Scopus (1646) Google Scholar], a library of commercially available compounds; PubChem [14.Kim S. et al.PubChem 2019 update: improved access to chemical data.Nucleic Acids Res. 2019; 47: D1102-D1109Crossref PubMed Scopus (1440) Google Scholar], molecules with biological relevance; ChEMBL [15.Gaulton A. et al.ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Res. 2012; 40: D1100-D1107Crossref PubMed Scopus (2302) Google Scholar], molecules with bioactivity data; and DrugBank [16.Wishart D.S. et al.DrugBank: a comprehensive resource for in silico drug discovery and exploration.Nucleic Acids Res. 2006; 34: D668-D672Crossref PubMed Scopus (2338) Google Scholar], approved or experimental therapeutic molecules. These virtual libraries (see Glossary) all represent 'general-purpose' chemical spaces with broad biological relevance and are therefore applied to many problems related to drug discovery [17.Walters W.P. Virtual chemical libraries.J. Med. Chem. 2019; 62: 1116-1124Crossref PubMed Scopus (83) Google Scholar]. More focused chemical spaces can be created through a domain-informed enumeration of compounds relevant to a specific application; for example, 1.6 million donor-bridge-acceptor trimers for organic electronics [18.Gomez-Bombarelli R. et al.Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach.Nat. Mater. 2016; 15: 1120-1127Crossref PubMed Scopus (509) Google Scholar] or 2.8 million transition-metal complexes for redox flow batteries [19.Janet J.P. et al.Accurate multiobjective design in a space of millions of transition metal complexes with neural-network-driven efficient global optimization.ACS Cent. Sci. 2020; 6: 513-524Crossref PubMed Scopus (60) Google Scholar]. These are exhaustively enumerated chemical spaces with strict constraints on which fragments are included and how they are attached, similar to R-group enumeration methods. Privileged fragments for drug-like molecules have been identified through retrosynthetic analysis and automatic fragmentation [20.Lewell X.Q. et al.RECAP – retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry.J. Chem. Inform. Comput. Sci. 1998; 38: 511-522Crossref PubMed Scopus (534) Google Scholar,21.Ertl P. Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups.J. Chem. Inform. Comput. Sci. 2003; 43: 374-380Crossref PubMed Scopus (219) Google Scholar]; the molecules produced by recombining these fragments are intended to look more promising than an enumeration based on graph structure alone. Graph-theoretical enumeration of molecular structures has been studied for over a century, starting with simple spaces like that of acyclic alkanes [22.Cayley E. Ueber die analytischen Figuren, welche in der Mathematik Bäume genannt werden und ihre Anwendung auf die Theorie chemischer Verbindungen.Ber. Dtsch. Chem. Ges. 1875; 8 (in German): 1056-1059Crossref Scopus (58) Google Scholar,23.Henze H.R. Blair C.M. The number of isomeric hydrocarbons of the methane series.J. Am. Chem. Soc. 1931; 53: 3077-3085Crossref Scopus (77) Google Scholar]. However, it is only recently that these structures have been recorded, evaluated, and used for discovery. The Chemical Space Project exemplifies modern exhaustive enumeration of all stable organic molecules containing common atom types up to a certain size [24.Reymond J.-L. The Chemical Space Project.Acc. Chem. Res. 2015; 48: 722-730Crossref PubMed Scopus (266) Google Scholar]. Since the original Generated DataBase (GDB) of up to seven heavy atoms [25.Fink T. Reymond J.-L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery.J. Chem. Inf. Model. 2007; 47: 342-353PubMed Google Scholar], Reymond and coworkers have enumerated, analyzed, and released the 166.4 billion structures of up to 17 heavy atoms [26.Ruddigkeit L. et al.Enumeration of 166 billion organic small molecules in the Chemical Universe database GDB-17.J. Chem. Inf. Model. 2012; 52: 2864-2875Crossref PubMed Scopus (569) Google Scholar] and published numerous visualizations and analyses thereof. In addition to the benefits of ensuring that X is relevant to the design objective, the predefinition of chemical spaces lets us impose arbitrary constraints on their contents. A practical constraint is the ease of experimental validation: that any candidate can be physically acquired for experimental testing. In the simplest case, a chemical space could be defined as the set of molecules in a company's chemical inventory or vendor catalog. Any compound from this list can be acquired rapidly for experimental evaluation. Accessibility is the primary motivation for make-on-demand libraries, which are chemical spaces defined as the molecules that are in stock or available and all molecules that can be produced from those structures through straightforward synthetic protocols. Libraries are often enumerated by applying a small number (<100) of reaction templates defining common single-step transformations to all possible combinations of starting materials [27.Cramer R.D. et al.Virtual compound libraries: a new approach to decision making in molecular discovery research.J. Chem. Inform. Comput. Sci. 1998; 38: 1010-1023Crossref Scopus (80) Google Scholar, 28.Nikitin S. et al.A very large diversity space of synthetically accessible compounds for use with drug design programs.J. Comput. Aided Mol. Des. 2005; 19: 47-63Crossref PubMed Scopus (31) Google Scholar, 29.Cramer R.D. et al.AllChem: generating and searching 1020 synthetically accessible structures.J. Comput. Aided Mol. Des. 2007; 21: 341-350Crossref PubMed Scopus (44) Google Scholar, 30.Patel H. et al.Knowledge-based approach to de novo design using reaction vectors.J. Chem. Inf. Model. 2009; 49: 1163-1184Crossref PubMed Scopus (61) Google Scholar] (Figure 1); recursive enumeration generates molecules accessible through multiple synthetic steps. There are numerous implementations of this approach [31.Hoffmann T. Gastreich M. The next level in chemical space navigation: going far beyond enumerable compound libraries.Drug Discov. Today. 2019; 24: 1148-1156Crossref PubMed Scopus (82) Google Scholar], including SAVI [32.Patel H. et al.Synthetically Accessible Virtual Inventory (SAVI).ChemRxiv. 2020; (Published online April 27, 2020. https://doi.org/10.26434/chemrxiv.12185559)Google Scholar], efforts within pharmaceutical companies [33.Hu Q. et al.LEAP into the Pfizer Global Virtual Library (PGVL) space: creation of readily synthesizable design ideas automatically.Methods Mol. Biol. 2011; 685: 253-276Crossref PubMed Scopus (29) Google Scholar,34.Nicolaou C.A. et al.The Proximal Lilly Collection: mapping, exploring and exploiting feasible chemical space.J. Chem. Inf. Model. 2016; 56: 1253-1266Crossref PubMed Scopus (48) Google Scholar], and efforts from commercial vendors (https://enamine.net/library-synthesis/real-compounds; https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual). As it becomes impractical to store such large numbers of compounds due to the combinatorial explosion of reaction products, these spaces may be defined implicitly. Whether molecules in these spaces are easy to synthesize depends on the robustness of rules used for enumeration. Lyu and colleagues cite an 86% synthesis success rate for 51 compounds selected from 170 million in the Enamine REAL library enumerated from 130 reaction types; WuXi estimates a 60–80% success rate for their 1.7-billion-member collection generated by 30 reaction types (https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual). This success rate might be improved through the use of machine-learning models for reaction outcome prediction [35.Coley C.W. et al.A graph-convolutional neural network model for the prediction of chemical reactivity.Chem. Sci. 2019; 10: 370-377Crossref PubMed Google Scholar,36.Schwaller P. et al.Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction.ACS Cent. Sci. 2019; 5: 1572-1583Crossref PubMed Scopus (190) Google Scholar], which for common reaction types exhibit accuracies above 90% on benchmark datasets. These neural models can be directly used to enumerate possible products or used to predict regio/stereoselectivity patterns [37.Tomberg A. et al.A predictive tool for electrophilic aromatic substitutions using machine learning.J. Org. Chem. 2019; 84: 4695-4703Crossref PubMed Scopus (38) Google Scholar, 38.Beker W. et al.Prediction of major regio-, site-, and diastereoisomers in Diels–Alder reactions by using machine-learning: the importance of physically meaningful descriptors.Angew. Chem. Int. Ed. 2019; 58: 4515-4519Crossref PubMed Scopus (63) Google Scholar, 39.Struble T.J. et al.Multitask prediction of site selectivity in aromatic C–H functionalization reactions.React. Chem. Eng. 2020; 5: 896-902Crossref Google Scholar]. Once these spaces are defined, there are several approaches to identify the top-performing molecules within them. The simplest strategy is, of course, to evaluate every candidate molecule. The feasibility of this approach depends on the nature of the evaluation and time/cost constraints. It would not be practical to physically test every compound in the ZINC database, but it could be for smaller collections like the Drug Repurposing Hub [40.Corsello S.M. et al.The Drug Repurposing Hub: a next-generation drug library and information resource.Nat. Med. 2017; 23: 405-408Crossref PubMed Scopus (352) Google Scholar] or the NCATS Pharmaceutical Collection [41.Huang R. et al.The NCATS Pharmaceutical Collection: a 10-year update.Drug Discov. Today. 2019; 24: 2341-2349Crossref PubMed Scopus (25) Google Scholar]. It is worth noting that technologies like DNA-encoded libraries [42.Clark M.A. et al.Design, synthesis and selection of DNA-encoded small-molecule libraries.Nat. Chem. Biol. 2009; 5: 647-654Crossref PubMed Scopus (416) Google Scholar] and phage display [43.Smith G.P. Petrenko V.A. Phage display.Chem. Rev. 1997; 97: 391-410Crossref PubMed Scopus (1352) Google Scholar] can be used to physically screen chemical spaces of trillions of molecules, albeit with a sparse and stochastic readout. If evaluation is computational, practicality is simply a question of computational budget. In one of the largest docking studies reported to date, 138 million and 99 million compounds from the Enamine REAL library were docked against the D4 receptor and AmpC, respectively [44.Lyu J. et al.Ultra large library docking for discovering new chemotypes.Nature. 2019; 566: 224-229Crossref PubMed Scopus (297) Google Scholar]. More recent studies have since screened over 1 billion enumerated molecules from the same database [45.Gorgulla C. et al.An open-source drug discovery platform enables ultra-large virtual screens.Nature. 2020; 580: 663-668Crossref PubMed Scopus (149) Google Scholar,46.Acharya A. et al.Supercomputer-based ensemble docking drug discovery pipeline with application to Covid-19.ChemRxiv. 2020; (Published online July 29, 2020. https://doi.org/10.26434/chemrxiv.12725465.v1)PubMed Google Scholar]. As make-on-demand libraries can exceed this scale by multiple orders of magnitude, we argue that such exhaustive screening techniques are not a viable long-term approach even for inexpensive evaluations like docking. A popular framework to reduce overall cost is active learning through iterative, model-guided optimization [47.Settles B. Active learning.Synth. Lect. Artif. Intell. Mach. Learn. 2012; 6: 1-114Crossref Scopus (625) Google Scholar]. This involves selecting subsets of experiments to perform based on predictions from a quantitative structure–property relationship (QSPR) model: a surrogate model f^(x) that codifies an approximation to f(x). In Bayesian optimization, predictions of performance and model uncertainty are both considered to balance the exploration of uncertain candidates and the exploitation of candidates likely to be high performing [48.Frazier P.I. A tutorial on Bayesian optimization.arXiv. 2018; (Published online July 8, 2018. https://arxiv.org/abs/1807.02811v1)Google Scholar]; simpler optimization schemes may simply perform a greedy search. Examples of this paradigm include the platform Eve for the identification of bioactive molecules [49.Williams K. et al.Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases.J. R. Soc. Interface. 2015; 12: 20141289Crossref PubMed Scopus (59) Google Scholar], retrospective identification of bioactive compounds using PubChem data [50.Kangas J.D. et al.Efficient discovery of responses of proteins to compounds using active learning.BMC Bioinformatics. 2014; 15: 143Crossref PubMed Scopus (23) Google Scholar], computational screening of OLED-relevant molecules [18.Gomez-Bombarelli R. et al.Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach.Nat. Mater. 2016; 15: 1120-1127Crossref PubMed Scopus (509) Google Scholar], and the selection of compounds for docking [51.Gentile F. et al.Deep Docking: a deep learning platform for augmentation of structure based drug discovery.ACS Cent. Sci. 2020; 6: 939-949Crossref PubMed Scopus (80) Google Scholar]. There are still many limitations to be addressed related to the surrogate model, f^, in terms of its low-data performance, generalization power, and ability to quantify uncertainty [52.Muratov E.N. et al.QSAR without borders.Chem. Soc. Rev. 2020; 49: 3525-3564Crossref PubMed Google Scholar], although methods for learning from graph-structured molecules are promising [53.Wu Z. et al.A comprehensive survey on graph neural networks.IEEE Trans. Neural Netw. Learn. Syst. 2020; (Published online March 24, 2020. https://doi.org/10.1109/TNNLS.2020.2978386)Crossref Scopus (951) Google Scholar]. Algorithmic improvements to better handle variable evaluation costs (e.g., the cost of purchasing a compound) and batched optimization (e.g., parallelized in well plates or over multiple CPUs) would be beneficial. While multiple iterations lead to improved surrogate models, a one-iteration approach can still be very effective. A novel antibiotic was recently identified from a drug repurposing collection with fewer experiments than an exhaustive screen this way [54.Stokes J.M. et al.A deep learning approach to antibiotic discovery.Cell. 2020; 180: 688-702.e13Abstract Full Text Full Text PDF
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
野性的柠檬完成签到,获得积分10
刚刚
温暖大米完成签到 ,获得积分10
8秒前
小喵完成签到 ,获得积分10
10秒前
火星上惜天完成签到 ,获得积分10
15秒前
Telomere完成签到 ,获得积分10
16秒前
研友_Ljqal8完成签到,获得积分10
22秒前
tjpuzhang完成签到 ,获得积分10
27秒前
HU完成签到 ,获得积分10
30秒前
游01完成签到 ,获得积分10
33秒前
fzhou完成签到 ,获得积分10
34秒前
有魅力翠柏完成签到 ,获得积分10
49秒前
郝君颖完成签到 ,获得积分10
50秒前
土拨鼠完成签到 ,获得积分10
50秒前
牧沛凝完成签到 ,获得积分10
1分钟前
心木完成签到 ,获得积分10
1分钟前
l老王完成签到 ,获得积分10
1分钟前
爆米花应助科研通管家采纳,获得30
1分钟前
chcmy完成签到 ,获得积分10
1分钟前
优秀毕业生完成签到,获得积分10
1分钟前
车水完成签到 ,获得积分10
1分钟前
sydhwo完成签到 ,获得积分10
2分钟前
江上游完成签到 ,获得积分10
2分钟前
长隆完成签到 ,获得积分10
2分钟前
美丽的问安完成签到 ,获得积分10
2分钟前
腰果虾仁完成签到 ,获得积分10
2分钟前
lilylwy完成签到 ,获得积分10
2分钟前
白衣映雪完成签到,获得积分20
2分钟前
清平发布了新的文献求助20
2分钟前
谭平完成签到 ,获得积分10
2分钟前
繁荣的谷蓝完成签到 ,获得积分10
2分钟前
yi完成签到,获得积分10
2分钟前
武广敏完成签到,获得积分10
3分钟前
logolush完成签到 ,获得积分10
3分钟前
Felicity完成签到 ,获得积分10
3分钟前
没用的三轮完成签到,获得积分10
3分钟前
北辰完成签到 ,获得积分10
3分钟前
霁昕完成签到 ,获得积分10
3分钟前
芙瑞完成签到 ,获得积分10
4分钟前
万事屋完成签到 ,获得积分10
4分钟前
joker完成签到 ,获得积分10
4分钟前
高分求助中
LNG地下式貯槽指針(JGA指-107) 1000
LNG地上式貯槽指針 (JGA指 ; 108) 1000
QMS18Ed2 | process management. 2nd ed 600
LNG as a marine fuel—Safety and Operational Guidelines - Bunkering 560
How Stories Change Us A Developmental Science of Stories from Fiction and Real Life 500
九经直音韵母研究 500
Full waveform acoustic data processing 500
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 免疫学 细胞生物学 电极
热门帖子
关注 科研通微信公众号,转发送积分 2934272
求助须知:如何正确求助?哪些是违规求助? 2589029
关于积分的说明 6975555
捐赠科研通 2234779
什么是DOI,文献DOI怎么找? 1186792
版权声明 589834
科研通“疑难数据库(出版商)”最低求助积分说明 580903