生物信息学
计算生物学
变压器
计算机科学
化学
工程类
生物
电压
生物化学
电气工程
基因
作者
Marco Nicolini,Emanuele Saitto,Ruben Emilio Jimenez Franco,Emanuele Cavalleri,Aldo Javier Galeano Alfonso,Dario Malchiodi,Alberto Paccanaro,Peter N. Robinson,Elena Casiraghi,Giorgio Valentini
标识
DOI:10.1016/j.csbj.2025.03.037
摘要
We introduce Finenzyme, a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning for the in silico modeling of enzymes. Our experiments show that Finenzyme significantly enhances generalist PLMs like ProGen for the in silico prediction and generation of enzymes belonging to specific Enzyme Commission (EC) categories. Our in silico experiments demonstrate that Finenzyme generated sequences can diverge from natural ones, while retaining similar predicted tertiary structure, predicted functions and the active sites of their natural counterparts. We show that embedded representations of the generated sequences obtained from the embeddings computed by both Finenzyme and ESMFold closely resemble those of natural ones, thus making them suitable for downstream tasks, including e.g. EC classification. Clustering analysis based on the primary and predicted tertiary structure of sequences reveals that the generated enzymes form clusters that largely overlap with those of natural enzymes. These overall in silico validation experiments indicate that Finenzyme effectively captures the structural and functional properties of target enzymes, and can in perspective support targeted enzyme engineering tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI