计算机科学
人工智能
特征学习
代表(政治)
编码器
图形
自然语言处理
机器学习
堆积
嵌入
集合(抽象数据类型)
无监督学习
模式识别(心理学)
理论计算机科学
物理
政治
政治学
法学
程序设计语言
操作系统
核磁共振
作者
Gabriel A. Pinheiro,Juarez L. F. Da Silva,Marcos G. Quiles
标识
DOI:10.1021/acs.jcim.2c00521
摘要
Machine learning as a tool for chemical space exploration broadens horizons to work with known and unknown molecules. At its core lies molecular representation, an essential key to improve learning about structure-property relationships. Recently, contrastive frameworks have been showing impressive results for representation learning in diverse domains. Therefore, this paper proposes a contrastive framework that embraces multimodal molecular data. Specifically, our approach jointly trains a graph encoder and an encoder for the simplified molecular-input line-entry system (SMILES) string to perform the contrastive learning objective. Since SMILES is the basis of our method, i.e., we built the molecular graph from the SMILES, we call our framework as SMILES Contrastive Learning (SMICLR). When stacking a nonlinear regressor on the SMICLR's pretrained encoder and fine-tuning the entire model, we reduced the prediction error by, on average, 44% and 25% for the energetic and electronic properties of the QM9 data set, respectively, over the supervised baseline. We further improved our framework's performance when applying data augmentations in each molecular-input representation. Moreover, SMICLR demonstrated competitive representation learning results in an unsupervised setting.
科研通智能强力驱动
Strongly Powered by AbleSci AI