计算机科学
嵌入
编码
财产(哲学)
人工智能
化学
判别式
职位(财务)
代表(政治)
深度学习
药物发现
机器学习
自然语言处理
化学
认识论
财务
政治
政治学
法学
经济
基因
哲学
生物化学
作者
Yunwu Liu,Ruisheng Zhang,Tongfeng Li,Jing Jiang,Jun Ma,Ping Wang
标识
DOI:10.1016/j.jmgm.2022.108344
摘要
Molecular property prediction is a significant task in drug discovery. Most deep learning-based computational methods either develop unique chemical representation or combine complex model. However, researchers are less concerned with the possible advantages of enormous quantities of unlabeled molecular data. Since the obvious limited amount of labeled data available, this task becomes more difficult. In some senses, SMILES of the drug molecule may be regarded of as a language for chemistry, taking inspiration from natural language processing research and current advances in pretrained models. In this paper, we incorporated Rotary Position Embedding(RoPE) efficiently encode the position information of SMILES sequences, ultimately enhancing the capability of the BERT pretrained model to extract potential molecular substructure information for molecular property prediction. We proposed the MolRoPE-BERT framework, an new end-to-end deep learning framework that integrates an efficient position coding approach for capturing sequence position information with a pretrained BERT model for molecular property prediction. To generate useful molecular substructure embeddings, we first exclusively train the MolRoPE-BERT on four million unlabeled drug SMILES(i.e., ZINC 15 and ChEMBL 27). Then, we conduct a series of experiments to evaluate the performance of our proposed MolRoPE-BERT on four well-studied datasets. Compared with conventional and state-of-the-art baselines, our experiment demonstrated comparable or superior performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI