过度拟合
计算机科学
财产(哲学)
一般化
构造(python库)
水准点(测量)
人工智能
任务(项目管理)
机器学习
数据挖掘
工程类
数学
数学分析
哲学
大地测量学
认识论
系统工程
人工神经网络
程序设计语言
地理
作者
Tianyi Jiang,Zeyu Wang,Jinhuan Wang,Jiafei Shao,Qi Xuan
出处
期刊:Communications in computer and information science
日期:2023-01-01
卷期号:: 389-402
标识
DOI:10.1007/978-981-99-3925-1_26
摘要
Recently, as the applications of machine learning boom in biochemistry, data augmentation has demonstrated its power in molecular generation tasks. Specifically, data augmentation can effectively relieve the problems that insufficient training data results in model overfitting in molecular property prediction, etc. While existing works focus on the rationality of augmented construct but neglect the importance of molecular scaffolds for the task of molecular property prediction. This paper analyzes the contribution of scaffolds in property prediction tasks and proposes a new augmentation technique that preserves functional groups and modifies molecular scaffolds during the augmentation process. By modifying scaffolds, data augmentation can increase the diversity of molecules and thus enrich the dataset. At the same time, by preserving the functional groups, the introduction of noise can be effectively reduced, the quality of augmented data can be improved, and the invariance of labels can be enhanced. We conducted experiments on four benchmark datasets using three baselines with different classification models to test the effectiveness of our proposed method. Our results strongly demonstrate that data augmentation with modifying scaffolds can effectively optimize property prediction performance and improve model generalization.
科研通智能强力驱动
Strongly Powered by AbleSci AI