化学
对偶(语法数字)
任务(项目管理)
药物发现
财产(哲学)
领域(数学分析)
领域知识
代表(政治)
机器学习
学习迁移
标记数据
人工智能
计算机科学
经济
法学
政治学
政治
管理
文学类
生物化学
艺术
认识论
数学
哲学
数学分析
作者
Yanjing Duan,Xixi Yang,Xiangxiang Zeng,Wen‐Xuan Wang,Youchao Deng,Dongsheng Cao
标识
DOI:10.1021/acs.jmedchem.4c00692
摘要
Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.
科研通智能强力驱动
Strongly Powered by AbleSci AI