Molecular language models: RNNs or transformer?

变压器 计算机科学 人工智能 生物 计算生物学 工程类 电气工程 电压
作者
Yangyang Chen,Zixu Wang,Xiangxiang Zeng,Yayang Li,Pengyong Li,Xiucai Ye,Tetsuya Sakurai
出处
期刊:Briefings in Functional Genomics [Oxford University Press]
卷期号:22 (4): 392-400 被引量:13
标识
DOI:10.1093/bfgp/elad012
摘要

Abstract Language models have shown the capacity to learn complex molecular distributions. In the field of molecular generation, they are designed to explore the distribution of molecules, and previous studies have demonstrated their ability to learn molecule sequences. In the early times, recurrent neural networks (RNNs) were widely used for feature extraction from sequence data and have been used for various molecule generation tasks. In recent years, the attention mechanism for sequence data has become popular. It captures the underlying relationships between words and is widely applied to language models. The Transformer-Layer, a model based on a self-attentive mechanism, also shines the same as the RNN-based model. In this research, we investigated the difference between RNNs and the Transformer-Layer to learn a more complex distribution of molecules. For this purpose, we experimented with three different generative tasks: the distributions of molecules with elevated scores of penalized LogP, multimodal distributions of molecules and the largest molecules in PubChem. We evaluated the models on molecular properties, basic metrics, Tanimoto similarity, etc. In addition, we applied two different representations of the molecule, SMILES and SELFIES. The results show that the two language models can learn complex molecular distributions and SMILES-based representation has better performance than SELFIES. The choice between RNNs and the Transformer-Layer needs to be based on the characteristics of dataset. RNNs work better on data focus on local features and decreases with multidistribution data, while the Transformer-Layer is more suitable when meeting molecular with larger weights and focusing on global features.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
温暖弱完成签到,获得积分10
1秒前
浅唱夏末发布了新的文献求助10
1秒前
李健的小迷弟应助颜林林采纳,获得10
1秒前
科研通AI6.2应助晰默采纳,获得10
3秒前
欢呼墨镜完成签到,获得积分10
3秒前
lkkkkk发布了新的文献求助10
4秒前
搞怪代荷完成签到,获得积分10
4秒前
嗯嗯嗯完成签到,获得积分10
5秒前
jiajia发布了新的文献求助10
6秒前
英俊的铭应助拼搏梦寒采纳,获得10
7秒前
浅唱夏末完成签到,获得积分10
7秒前
DooDoo完成签到,获得积分10
7秒前
绝世容颜完成签到,获得积分10
7秒前
8秒前
Msure发布了新的文献求助10
8秒前
8秒前
一指墨发布了新的文献求助10
9秒前
wqy发布了新的文献求助10
9秒前
cy完成签到,获得积分10
9秒前
11秒前
NexusExplorer应助苏silence采纳,获得10
12秒前
ilyaxxx发布了新的文献求助10
12秒前
顺顺利利完成签到 ,获得积分10
15秒前
情怀应助徐玉辉采纳,获得10
16秒前
拼搏梦寒完成签到,获得积分10
17秒前
17秒前
Hello应助zz采纳,获得10
18秒前
缥缈月光完成签到,获得积分10
19秒前
威武饼干完成签到,获得积分10
19秒前
19秒前
21秒前
忐忑的忆霜完成签到,获得积分10
22秒前
22秒前
22秒前
22秒前
屈奕发布了新的文献求助10
22秒前
wqy完成签到,获得积分10
23秒前
zz完成签到,获得积分10
23秒前
26秒前
高分求助中
The Graphene Handbook (2019 Edition) 800
Signals, Systems, and Signal Processing 610
IEST-RP-CC018: Cleanroom Cleaning and Sanitization: Operating and Monitoring Procedures 600
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
久松真一著作集〈第5巻〉禅と芸術 500
Fundamentals of Modern Mathematics: A Practical Review (Dover Books on Mathematics) 500
Cold War Transcended: Australia's China Policy, 1949-1990 470
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6598743
求助须知:如何正确求助?哪些是违规求助? 8368192
关于积分的说明 17911560
捐赠科研通 5752822
什么是DOI,文献DOI怎么找? 2953823
邀请新用户注册赠送积分活动 1929064
关于科研通互助平台的介绍 1823914