变压器
计算机科学
人工智能
试验装置
生物系统
模式识别(心理学)
算法
材料科学
电压
生物
工程类
电气工程
作者
Eesha Khare,Constancio González‐Obeso,David L. Kaplan,Markus J. Buehler
出处
期刊:ACS Biomaterials Science & Engineering
[American Chemical Society]
日期:2022-09-23
卷期号:8 (10): 4301-4310
被引量:20
标识
DOI:10.1021/acsbiomaterials.2c00737
摘要
Collagen is one of the most important structural proteins in biology, and its structural hierarchy plays a crucial role in many mechanically important biomaterials. Here, we demonstrate how transformer models can be used to predict, directly from the primary amino acid sequence, the thermal stability of collagen triple helices, measured via the melting temperature Tm. We report two distinct transformer architectures to compare performance. First, we train a small transformer model from scratch, using our collagen data set featuring only 633 sequence-to-Tm pairings. Second, we use a large pretrained transformer model, ProtBERT, and fine-tune it for a particular downstream task by utilizing sequence-to-Tm pairings, using a deep convolutional network to translate natural language processing BERT embeddings into required features. Both the small transformer model and the fine-tuned ProtBERT model have similar R2 values of test data (R2 = 0.84 vs 0.79, respectively), but the ProtBERT is a much larger pretrained model that may not always be applicable for other biological or biomaterials questions. Specifically, we show that the small transformer model requires only 0.026% of the number of parameters compared to the much larger model but reaches almost the same accuracy for the test set. We compare the performance of both models against 71 newly published sequences for which Tm has been obtained as a validation set and find reasonable agreement, with ProtBERT outperforming the small transformer model. The results presented here are, to our best knowledge, the first demonstration of the use of transformer models for relatively small data sets and for the prediction of specific biophysical properties of interest. We anticipate that the work presented here serves as a starting point for transformer models to be applied to other biophysical problems.
科研通智能强力驱动
Strongly Powered by AbleSci AI