计算机科学
变压器
模式(遗传算法)
蛋白质测序
多序列比对
序列(生物学)
序列比对
人工智能
数据挖掘
计算生物学
模式识别(心理学)
生物信息学
肽序列
机器学习
生物
遗传学
工程类
基因
电压
电气工程
作者
Armin Behjati,Fatemeh Zare‐Mirakabad,Seyed Shahriar Arab,Abbas Nowzari-Dalini
标识
DOI:10.1016/j.compbiolchem.2022.107717
摘要
Profiles are used to model protein families and domains. They are built by multiple sequence alignments obtained by mapping a query sequence against a database to generate a profile based on the substitution scoring matrix. The profile applications are very dependent on the alignment algorithm and scoring system for amino acid substitution. However, sometimes there are no similar sequences in the database with the query sequence based on the scoring schema. In these cases, it is not possible to make a profile. This paper proposes a method named PA_SPP, based on pre-trained ProtAlbert transformer to predict the profile for a single protein sequence without alignment. The performance of transformers on natural languages is impressive. Protein sequences can be viewed as a language; we can benefit from these models. We analyze the attention heads in different layers of ProtAlbert to show that the transformer can capture five essential protein characteristics of a single sequence. This assessment shows that ProtAlbert considers some protein properties when suggesting amino acids for each position in the sequence. In other words, transformers can be considered an appropriate alternative for alignment and scoring schema to predict a profile. We evaluate PA_SPP on the Casp13 dataset, including 55 proteins. Meanwhile, one thermophilic and two mesophilic proteins are used as case studies. The results display high similarity between the predicted profiles and HSSP profiles.
科研通智能强力驱动
Strongly Powered by AbleSci AI