计算机科学
模态(人机交互)
模式
人工智能
遮罩(插图)
机器学习
图形
特征学习
财产(哲学)
训练集
理论计算机科学
艺术
哲学
认识论
视觉艺术
社会科学
社会学
作者
Ao Shen,Mingzhi Yuan,Yingfan Ma,Jie Du,Manning Wang
摘要
Abstract Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.
科研通智能强力驱动
Strongly Powered by AbleSci AI