Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening

萨萨 一般化 随机森林 相似性(几何) 计算生物学 聚类分析 人工智能 计算机科学 机器学习 虚拟筛选 数据挖掘 生物信息学 数学 生物 药物发现 图像(数学) 数学分析 古生物学
作者
Hui Zhu,Jincai Yang,Niu Huang
出处
期刊:Journal of Chemical Information and Modeling [American Chemical Society]
卷期号:62 (22): 5485-5502 被引量:18
标识
DOI:10.1021/acs.jcim.2c01149
摘要

In structure-based virtual screening (SBVS), it is critical that scoring functions capture protein–ligand atomic interactions. By focusing on the local domains of ligand binding pockets, a standardized pocket Pfam-based clustering (Pfam-cluster) approach was developed to assess the cross-target generalization ability of machine-learning scoring functions (MLSFs). Subsequently, 12 typical MLSFs were evaluated using random cross-validation (Random-CV), protein sequence similarity-based cross-validation (Seq-CV), and pocket Pfam-based cross-validation (Pfam-CV) methods. Surprisingly, all of the tested models showed decreased performances from Random-CV to Seq-CV to Pfam-CV experiments, not showing satisfactory generalization capacity. Our interpretable analysis suggested that the predictions on novel targets by MLSFs were dependent on buried solvent-accessible surface area (SASA)-related features of complex structures, with greater predicted binding affinities on complexes owning larger protein–ligand interfaces. By combining buried SASA-related features with target-specific patterns that were only shared among structurally similar compounds in the same cluster, the random forest (RF)-Score attained a good performance in the Random-CV test. Based on these findings, we strongly advise assessing the generalization ability of MLSFs with the Pfam-cluster approach and being cautious with the features learned by MLSFs.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
伊雪儿发布了新的文献求助10
刚刚
刚刚
望着拥有完成签到,获得积分10
1秒前
1秒前
在水一方应助kilig采纳,获得10
2秒前
人福药业完成签到,获得积分10
2秒前
lll发布了新的文献求助10
2秒前
2秒前
ting5260完成签到,获得积分10
2秒前
化学喵完成签到 ,获得积分10
3秒前
4秒前
Viper3发布了新的文献求助50
4秒前
善学以致用应助tlm采纳,获得10
4秒前
4秒前
5秒前
5秒前
多情的续发布了新的文献求助10
5秒前
cccccc发布了新的文献求助10
5秒前
5秒前
6秒前
无极微光应助文艺的从筠采纳,获得20
6秒前
独特冬天发布了新的文献求助10
6秒前
hh发布了新的文献求助10
7秒前
英姑应助得己采纳,获得10
7秒前
7秒前
7秒前
AHR发布了新的文献求助10
7秒前
wanci应助xinl518采纳,获得10
8秒前
8秒前
8秒前
8秒前
8秒前
爱迷糊的小白完成签到,获得积分10
9秒前
9秒前
满意的苑博完成签到,获得积分10
9秒前
10秒前
lcc发布了新的文献求助10
10秒前
ChatGPT发布了新的文献求助10
10秒前
夨坕发布了新的文献求助10
11秒前
七七发布了新的文献求助10
11秒前
高分求助中
Lewis’s Child and Adolescent Psychiatry: A Comprehensive Textbook Sixth Edition 2000
Cronologia da história de Macau 1600
Treatment response-adapted risk index model for survival prediction and adjuvant chemotherapy selection in nonmetastatic nasopharyngeal carcinoma 1000
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 1000
BRITTLE FRACTURE IN WELDED SHIPS 1000
Intentional optical interference with precision weapons (in Russian) Преднамеренные оптические помехи высокоточному оружию 1000
Atlas of Anatomy 5th original digital 2025的PDF高清电子版(非压缩版,大小约400-600兆,能更大就更好了) 1000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 计算机科学 化学工程 生物化学 物理 复合材料 内科学 催化作用 物理化学 光电子学 细胞生物学 基因 电极 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6198431
求助须知:如何正确求助?哪些是违规求助? 8025905
关于积分的说明 16708102
捐赠科研通 5292292
什么是DOI,文献DOI怎么找? 2820375
邀请新用户注册赠送积分活动 1800072
关于科研通互助平台的介绍 1662553