计算机科学
变压器
杠杆(统计)
编码
人工智能
化学空间
机器学习
自然语言处理
药物发现
电压
电气工程
生物信息学
化学
工程类
生物化学
生物
基因
作者
Shengjie Luo,Tianlang Chen,Yixian Xu,Shuxin Zheng,Бо Лю,Liwei Wang,Di He
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:2
标识
DOI:10.48550/arxiv.2210.01765
摘要
Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.
科研通智能强力驱动
Strongly Powered by AbleSci AI