3DSMILES-GPT: 3D Molecular Pocket-based Generation with Token-only Large Language Model

安全性令牌 计算机科学 药物发现 分子 强化学习 第一代 生物系统 人工智能 化学 生物 人口 生物化学 人口学 计算机安全 有机化学 社会学
作者
Jike Wang,Hao Luo,Rui Qin,Mingyang Wang,Xiaozhe Wan,Meijing Fang,Odin Zhang,Qiaolin Gou,Qun Su,Chao Shen,Ziyi You,Liwei Liu,Chang‐Yu Hsieh,Tingjun Hou,Yu Kang
出处
期刊:Chemical Science [The Royal Society of Chemistry]
被引量:1
标识
DOI:10.1039/d4sc06864e
摘要

The generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively. We treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions, combining them through full-dimensional representations and pre-training the model on a vast dataset encompassing tens of millions of drug-like molecules. This token-only approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the model using pair-wise structural data of protein pockets and molecules, followed by reinforcement learning to further optimize the biophysical and chemical properties of the generated molecules. Experimental results demonstrate that 3DSMILES-GPT generates molecules that comprehensively outperform existing methods in terms of binding affinity, drug-likeness (QED), and synthetic accessibility score (SAS). Notably, it achieves a 33% enhancement in the quantitative estimation of QED, meanwhile the binding affinity estimated by Vina docking maintaining its state-of-the-art performance. The generation speed is remarkably fast, with the average time approximately 0.45 seconds per generation, representing a threefold increase over the fastest existing methods. This innovative 3DSMILES-GPT approach has the potential to positively impact the generation of 3D molecules in drug discovery.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刘家翔发布了新的文献求助10
刚刚
小张发布了新的文献求助10
1秒前
专注的秀发布了新的文献求助10
2秒前
3秒前
l不哼唧菡发布了新的文献求助10
8秒前
南河发布了新的文献求助10
9秒前
11秒前
13秒前
ysl完成签到,获得积分10
15秒前
fzy发布了新的文献求助200
16秒前
Hello应助专注的秀采纳,获得30
16秒前
16秒前
白一丹发布了新的文献求助10
16秒前
SHH完成签到,获得积分20
20秒前
KIRA发布了新的文献求助20
21秒前
22秒前
神经蛙完成签到,获得积分10
23秒前
24秒前
希望天下0贩的0应助南河采纳,获得10
25秒前
zero完成签到,获得积分10
26秒前
糯米兹发布了新的文献求助10
27秒前
29秒前
zcx发布了新的文献求助10
29秒前
zero发布了新的文献求助10
29秒前
菜菜完成签到 ,获得积分10
30秒前
32秒前
34秒前
36秒前
糯米兹完成签到,获得积分10
37秒前
37秒前
英俊绿海完成签到 ,获得积分10
38秒前
不辣的完成签到 ,获得积分10
40秒前
cola完成签到,获得积分10
40秒前
Fang发布了新的文献求助10
40秒前
curtisness应助8888拉采纳,获得10
43秒前
白一丹完成签到,获得积分10
44秒前
阳光易真发布了新的文献求助10
46秒前
47秒前
彭于晏应助笑一笑采纳,获得10
48秒前
万能图书馆应助Fang采纳,获得10
50秒前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2000
Very-high-order BVD Schemes Using β-variable THINC Method 1200
中国荞麦品种志 1000
BIOLOGY OF NON-CHORDATES 1000
Autoregulatory progressive resistance exercise: linear versus a velocity-based flexible model 550
The Collected Works of Jeremy Bentham: Rights, Representation, and Reform: Nonsense upon Stilts and Other Writings on the French Revolution 320
Discourse, Identities and Genres in Corporate Communication 300
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3359478
求助须知:如何正确求助?哪些是违规求助? 2982264
关于积分的说明 8702828
捐赠科研通 2663878
什么是DOI,文献DOI怎么找? 1458686
科研通“疑难数据库(出版商)”最低求助积分说明 675236
邀请新用户注册赠送积分活动 666300