L-MolGAN: An improved implicit generative model for large molecular graphs

分子图生成模型分子描述符生成对抗网络计算机科学图形分配系数生成语法生物系统理论计算机科学人工智能机器学习化学数量结构-活动关系深度学习色谱法生物

作者

Yutaka Tsujimoto,Satoru Hiwa,Yushi Nakamura,Yohei Oe,Tomoyuki Hiroyasu

链接

chemrxiv.orgdoi.org

标识

DOI：10.26434/chemrxiv.14569545.v3

摘要

Deep generative models are used to generate arbitrary molecular structures with the desired chemical properties. MolGAN is a renowned molecular generation models that uses generative adversarial networks (GANs) and reinforcement learning to generate molecular graphs in one shot. MolGAN can effectively generate a small molecular graph with nine or fewer heavy atoms. However, the graphs tend to become disconnected as the molecular size increase. This poses a challenge to drug discovery and material design, where large molecules are potentially inclusive. This study develops an improved MolGAN for large molecule generation (L-MolGAN). In this model, the connectivity of molecular graphs is evaluated by a depth-first search during the model training process. When a disconnected molecular graph is generated, L-MolGAN rewards the graph a zero score. This procedure decreases the number of disconnected graphs, and consequently increases the number of connected molecular graphs. The effectiveness of L-MolGAN is experimentally evaluated. The size and connectivity of the molecular graphs generated with data from the ZINC-250k molecular dataset are confirmed using MolGAN as the baseline model. The model is then optimized for a quantitative estimate of drug-likeness (QED) to generate drug-like molecules. The experimental results indicate that the connectivity measure of generated molecular graphs improved by 1.96 compared with the baseline model at a larger maximum molecular size of 20 atoms. The molecules generated by L-MolGAN are evaluated in terms of multiple chemical properties, QED, synthetic accessibility, and log octanol–water partition coefficient, which are important in drug design. This result confirms that L-MolGAN can generate various drug-like molecules despite being optimized for a single property, i.e., QED. This method will contribute to the efficient discovery of new molecules of larger sizes than those being generated with the existing method.

求助该文献

L-MolGAN: An improved implicit generative model for large molecular graphs

今日热心研友