计算机科学
图形
机器学习
人工智能
训练集
预测能力
集合(抽象数据类型)
代理(统计)
数据挖掘
理论计算机科学
认识论
哲学
程序设计语言
作者
Xinxin Yu,Yuanting Chen,Long Chen,Weihua Li,Yuhao Wang,Yun Tang,Guixia Liu
标识
DOI:10.1002/minf.202400169
摘要
Abstract In silico methods for prediction of chemical toxicity can decrease the cost and increase the efficiency in the early stage of drug discovery. However, due to low accessibility of sufficient and reliable toxicity data, constructing robust and accurate prediction models is challenging. Contrastive learning, a type of self‐supervised learning, leverages large unlabeled data to obtain more expressive molecular representations, which can boost the prediction performance on downstream tasks. While molecular graph contrastive learning has gathered growing attentions, current models neglect the quality of negative data set. Here, we proposed a self‐supervised pretraining deep learning framework named GCLmf. We first utilized molecular fragments that meet specific conditions as hard negative samples to boost the quality of the negative set and thus increase the difficulty of the proxy tasks during pre‐training to learn informative representations. GCLmf has shown excellent predictive power on various molecular property benchmarks and demonstrates high performance in 33 toxicity tasks in comparison with multiple baselines. In addition, we further investigated the necessity of introducing hard negatives in model building and the impact of the proportion of hard negatives on the model.
科研通智能强力驱动
Strongly Powered by AbleSci AI