🔥 科研通第二届『应助活动周』正在进行中,3月24-30日求助秒级响应🚀,千元现金等你拿。当前排名🏆 📚 中科院2025期刊分区📊 已更新

Improved GNNs for Log D7.4 Prediction by Transferring Knowledge from Low-Fidelity Data

计算机科学 人工智能 忠诚 机器学习 数据挖掘 电信
作者
Yanjing Duan,Li Fu,Xiaochen Zhang,Teng-Zhi Long,He Yuan-Hang,Zhaoqian Liu,Aiping Lü,Yafeng Deng,Chang‐Yu Hsieh,Tingjun Hou,Dongsheng Cao
出处
期刊:Journal of Chemical Information and Modeling [American Chemical Society]
卷期号:63 (8): 2345-2359 被引量:13
标识
DOI:10.1021/acs.jcim.2c01564
摘要

The n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D7.4) is an indicator of lipophilicity, and it influences a wide variety of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties and druggability of compounds. In log D7.4 prediction, graph neural networks (GNNs) can uncover subtle structure–property relationships (SPRs) by automatically extracting features from molecular graphs that facilitate the learning of SPRs, but their performances are often limited by the small size of available datasets. Herein, we present a transfer learning strategy called pretraining on computational data and then fine-tuning on experimental data (PCFE) to fully exploit the predictive potential of GNNs. PCFE works by pretraining a GNN model on 1.71 million computational log D data (low-fidelity data) and then fine-tuning it on 19,155 experimental log D7.4 data (high-fidelity data). The experiments for three GNN architectures (graph convolutional network (GCN), graph attention network (GAT), and Attentive FP) demonstrated the effectiveness of PCFE in improving GNNs for log D7.4 predictions. Moreover, the optimal PCFE-trained GNN model (cx-Attentive FP, Rtest2 = 0.909) outperformed four excellent descriptor-based models (random forest (RF), gradient boosting (GB), support vector machine (SVM), and extreme gradient boosting (XGBoost)). The robustness of the cx-Attentive FP model was also confirmed by evaluating the models with different training data sizes and dataset splitting strategies. Therefore, we developed a webserver and defined the applicability domain for this model. The webserver (http://tools.scbdd.com/chemlogd/) provides free log D7.4 prediction services. In addition, the important descriptors for log D7.4 were detected by the Shapley additive explanations (SHAP) method, and the most relevant substructures of log D7.4 were identified by the attention mechanism. Finally, the matched molecular pair analysis (MMPA) was performed to summarize the contributions of common chemical substituents to log D7.4, including a variety of hydrocarbon groups, halogen groups, heteroatoms, and polar groups. In conclusion, we believe that the cx-Attentive FP model can serve as a reliable tool to predict log D7.4 and hope that pretraining on low-fidelity data can help GNNs make accurate predictions of other endpoints in drug discovery.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
应助活动周(3月24-30日)排名
今日排名(3月25日)
1#209 nozero
69
1400
2#147 xjcy
73
740
3#125 点着太阳的人
42
830
4#83 科研小民工
39
440
5#73 浦肯野
33
400
6#49 36456657
24
250
7#44 菠菜
4
400
8#31 彭于彦祖
10
210
9#30 子车茗
14
160
10#29 昏睡的蟠桃
7
220
11#28 S77
14
140
12#26 Auston_zhong
13
130
13#26 Loooong
13
130
14#26 小透明
13
130
15#22 你是八戒呀
11
110
16#20 Leslie
10
100
17#20 cdercder
6
140
18#19 劲秉
4
150
19#18 smartCH
9
90
20#18 shinysparrow
9
90
21#18 赖向珊
2
160
22#17 pcr163
4
130
23#16 liv
8
80
24#14 CAOHOU
7
70
25#14 紫色水晶之恋
7
70
26#14 hhhblabla
7
70
27#12 pluto
6
60
28#12 ccc
6
60
29#12 要减肥的湘云
4
80
30#12 流光广陵
6
60
31#12 SYLH
6
60
32#12 杳鸢
6
60
第1名:50元;第2名:30元;第3名:10元

总排名
1#1365 nozero
546
8190
2#1016 科研小民工
381
6350
3#875 shinysparrow
385
4900
4#636 SYLH
318
3180
5#443 小透明
204
2390
6#342 xjcy
170
1720
7#219 浦肯野
95
1240
8#184 子车茗
89
950
9#182 36456657
89
930
10#134 昏睡的蟠桃
50
840
11#134 Leon
66
680
12#130 whisper
65
650
13#125 点着太阳的人
42
830
14#120 我是站长才怪
60
600
15#104 毛豆
52
520
16#104 火星上的菲鹰
52
520
17#98 zho
49
490
18#95 灵巧高山
36
590
19#93 劲秉
32
610
20#93 curtisness
46
470
21#92 史小菜
44
480
22#90 S77
45
450
23#84 tuanheqi
11
730
24#78 哎嘿
38
400
25#76 hbsand
37
390
26#70 Auston_zhong
35
350
27#70 muxiangrong
32
380
28#64 Catalina_S
32
320
29#62 suibianba
30
320
30#60 Leif
30
300
31#59 cdercder
21
380
32#58 CAOHOU
29
290
33#56 研友_Z30GJ8
27
290
34#54 敬老院1号
4
500
35#52 木头马尾
26
260
36#52 实验好难
26
260
37#52 QOP
26
260
38#50 迟大猫
25
250
39#50 紫色水晶之恋
25
250
40#44 菠菜
4
400
41#42 Lars汉堡
21
210
42#42 无敌最俊朗
18
240
43#42 云瑾
21
210
44#41 贰鸟
20
210
45#40 体贴凌柏
20
200
46#40 怼怼
20
200
47#39 彭于彦祖
14
250
48#38 xunxunmimi
19
190
49#38 Loooong
19
190
50#38 加菲丰丰
19
190
第1名:500元;第2名:300元;第3名:100元
第4名:50元;第5名:30元;第6-10名:10元

10分钟更新一次,完整排名情况
实时播报
李健的粉丝团团长应助huhu采纳,获得10
刚刚
哒哒发布了新的文献求助10
刚刚
belf发布了新的文献求助10
刚刚
ddfsadfs发布了新的文献求助10
1秒前
善学以致用应助zjw采纳,获得10
1秒前
蝴蝶洁完成签到 ,获得积分10
2秒前
2秒前
piney发布了新的文献求助10
3秒前
Akim应助阿斯顿采纳,获得10
3秒前
科研通AI5应助调皮纸飞机采纳,获得10
4秒前
54zxy完成签到,获得积分10
4秒前
5秒前
Ming Chen发布了新的文献求助10
5秒前
5秒前
毛豆应助胡胡采纳,获得10
5秒前
milan001完成签到,获得积分10
6秒前
小二郎应助网球采纳,获得10
6秒前
栗子完成签到 ,获得积分10
6秒前
内向的成仁完成签到,获得积分10
7秒前
轩辕忆枫完成签到,获得积分10
8秒前
斯文败类应助柔弱熊猫采纳,获得10
8秒前
LoganLee发布了新的文献求助10
8秒前
黎明发布了新的文献求助10
9秒前
9秒前
Wendy完成签到,获得积分10
9秒前
9秒前
admire完成签到,获得积分10
9秒前
芳芳完成签到 ,获得积分10
9秒前
10秒前
10秒前
思源应助nanoletter采纳,获得50
10秒前
科研鑫完成签到,获得积分10
10秒前
12秒前
12秒前
13秒前
kang发布了新的文献求助10
13秒前
一颗小星星完成签到,获得积分10
14秒前
14秒前
zz发布了新的文献求助10
14秒前
思源应助阿松大采纳,获得10
15秒前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2700
On Troodon validus, an orthopodous dinosaur from the Belly River Cretaceous of Alberta, Canada 2000
Continuum Thermodynamics and Material Modelling 2000
Les Mantodea de Guyane Insecta, Polyneoptera 2000
Conference Record, IAS Annual Meeting 1977 1250
British Girl Chinese Wife (New World Press, 1985) 800
Teaching language in context (3rd edition) by Derewianka, Beverly; Jones, Pauline 610
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 3614563
求助须知:如何正确求助?哪些是违规求助? 3185771
关于积分的说明 9611940
捐赠科研通 2891861
什么是DOI,文献DOI怎么找? 1586351
邀请新用户注册赠送积分活动 746353
科研通“疑难数据库(出版商)”最低求助积分说明 728323