计算机科学
序列(生物学)
蛋白质结构预测
图形
人工神经网络
人工智能
机器学习
源代码
编码(集合论)
数据挖掘
计算生物学
蛋白质结构
理论计算机科学
生物
遗传学
程序设计语言
生物化学
集合(抽象数据类型)
作者
A. Del Boca,Simon Mathis
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:1
标识
DOI:10.48550/arxiv.2306.12231
摘要
Pre-trained models have been successful in many protein engineering tasks. Most notably, sequence-based models have achieved state-of-the-art performance on protein fitness prediction while structure-based models have been used experimentally to develop proteins with enhanced functions. However, there is a research gap in comparing structure- and sequence-based methods for predicting protein variants that are better than the wildtype protein. This paper aims to address this gap by conducting a comparative study between the abilities of equivariant graph neural networks (EGNNs) and sequence-based approaches to identify promising amino-acid mutations. The results show that our proposed structural approach achieves a competitive performance to sequence-based methods while being trained on significantly fewer molecules. Additionally, we find that combining assay labelled data with structure pre-trained models yields similar trends as with sequence pre-trained models. Our code and trained models can be found at: https://github.com/semiluna/partIII-amino-acid-prediction.
科研通智能强力驱动
Strongly Powered by AbleSci AI