形式主义(音乐)
取代基
生成语法
核心模型
数量结构-活动关系
芯(光纤)
计算机科学
化学
数学
人工智能
立体化学
艺术
音乐剧
视觉艺术
数学分析
电信
作者
Hengwei Chen,Jürgen Bajorath
标识
DOI:10.1021/acs.jcim.4c01781
摘要
In medicinal chemistry, compound optimization relies on the generation of analogue series (AS) for exploring structure–activity relationships (SARs). Potency progression is a critical criterion for advancing AS. During optimization, a key question is which analogues to synthesize next. We introduce a new computational methodology for the extension of AS with potent compounds containing both core structure and substituent modifications at multiple sites, which has been reported for the first time. The approach combines a transformer chemical language model (CLM) with a SAR matrix (SARM) methodology that identifies and organizes structurally related AS. Therefore, the SARM approach was expanded to cover multisite AS. Consensus series extracted from SARMs representing a potency gradient served as input for CLM training to extend test AS with potent analogues. Different model variants were derived and investigated. Both general and fine-tuned models correctly predicted known potent analogues at high positions in probability-based compound rankings and chemically diversified AS through core structure modifications of the generated candidate compounds and substituent replacements at multiple sites.
科研通智能强力驱动
Strongly Powered by AbleSci AI