计算机科学
蛋白质结构预测
人工智能
卷积神经网络
蛋白质结构
序列(生物学)
多序列比对
蛋白质测序
机器学习
模式识别(心理学)
序列比对
生物
肽序列
遗传学
基因
生物化学
作者
Konstantin Weißenow,Michael Heinzinger,Burkhard Rost
出处
期刊:Structure
[Elsevier]
日期:2022-05-23
卷期号:30 (8): 1169-1177.e4
被引量:84
标识
DOI:10.1016/j.str.2022.05.001
摘要
Advanced protein structure prediction requires evolutionary information from multiple sequence alignments (MSAs) from evolutionary couplings that are not always available. Artificial intelligence (AI)-based predictions inputting only single sequences are faster but so inaccurate as to render speed irrelevant. Here, we described a competitive prediction of inter-residue distances (2D structure) exclusively inputting embeddings from pre-trained protein language models (pLMs), namely ProtT5, from single sequences into a convolutional neural network (CNN) with relatively few layers. The major advance used the ProtT5 attention heads. Our new method, EMBER2, which never requires any MSAs, performed similarly to other methods that fully rely on co-evolution. Although clearly not reaching AlphaFold2, our leaner solution came somehow close at substantially lower costs. By generating protein-specific rather than family-averaged predictions, EMBER2 might better capture some features of particular protein structures. Results from using protein engineering and deep mutational scanning (DMS) experiments provided at least a proof of principle for such a speculation.
科研通智能强力驱动
Strongly Powered by AbleSci AI