亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Improved structure-related prediction for insufficient homologous proteins using MSA enhancement and pre-trained language model

计算机科学 蛋白质超家族 同源建模 线程(蛋白质序列) 计算生物学 人工智能 多序列比对 数据挖掘 机器学习 蛋白质结构 序列比对 生物 肽序列 遗传学 基因 生物化学
作者
Qiaozhen Meng,Fei Guo,Jijun Tang
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
标识
DOI:10.1093/bib/bbad217
摘要

Abstract In recent years, protein structure problems have become a hotspot for understanding protein folding and function mechanisms. It has been observed that most of the protein structure works rely on and benefit from co-evolutionary information obtained by multiple sequence alignment (MSA). As an example, AlphaFold2 (AF2) is a typical MSA-based protein structure tool which is famous for its high accuracy. As a consequence, these MSA-based methods are limited by the quality of the MSAs. Especially for orphan proteins that have no homologous sequence, AlphaFold2 performs unsatisfactorily as MSA depth decreases, which may pose a barrier to its widespread application in protein mutation and design problems in which there are no rich homologous sequences and rapid prediction is needed. In this paper, we constructed two standard datasets for orphan and de novo proteins which have insufficient/none homology information, called Orphan62 and Design204, respectively, to fairly evaluate the performance of the various methods in this case. Then, depending on whether or not utilizing scarce MSA information, we summarized two approaches, MSA-enhanced and MSA-free methods, to effectively solve the issue without sufficient MSAs. MSA-enhanced model aims to improve poor MSA quality from the data source by knowledge distillation and generation models. MSA-free model directly learns the relationship between residues on enormous protein sequences from pre-trained models, bypassing the step of extracting the residue pair representation from MSA. Next, we evaluated the performance of four MSA-free methods (trRosettaX-Single, TRFold, ESMFold and ProtT5) and MSA-enhanced (Bagging MSA) method compared with a traditional MSA-based method AlphaFold2, in two protein structure-related prediction tasks, respectively. Comparison analyses show that trRosettaX-Single and ESMFold which belong to MSA-free method can achieve fast prediction ($\sim\! 40$s) and comparable performance compared with AF2 in tertiary structure prediction, especially for short peptides, $\alpha $-helical segments and targets with few homologous sequences. Bagging MSA utilizing MSA enhancement improves the accuracy of our trained base model which is an MSA-based method when poor homology information exists in secondary structure prediction. Our study provides biologists an insight of how to select rapid and appropriate prediction tools for enzyme engineering and peptide drug development. Contact guofei@csu.edu.cn, jj.tang@siat.ac.cn
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
柔弱紊发布了新的文献求助10
1秒前
小蘑菇应助rain采纳,获得10
3秒前
4秒前
阳光的访烟完成签到,获得积分20
6秒前
9秒前
dax大雄完成签到 ,获得积分10
10秒前
21秒前
勤劳怜寒完成签到,获得积分10
25秒前
cheng完成签到,获得积分10
27秒前
zhxi完成签到,获得积分20
36秒前
zhxi发布了新的文献求助10
40秒前
NS完成签到,获得积分10
1分钟前
科目三应助wang采纳,获得10
1分钟前
1分钟前
dormraider完成签到,获得积分10
1分钟前
1分钟前
1分钟前
rain发布了新的文献求助10
1分钟前
乐乐应助shanwaishishan采纳,获得10
1分钟前
山竹完成签到,获得积分10
1分钟前
山竹发布了新的文献求助10
1分钟前
爆米花应助科研通管家采纳,获得10
1分钟前
sutharsons应助ceeray23采纳,获得111
1分钟前
1分钟前
自强不息完成签到 ,获得积分10
1分钟前
1分钟前
2分钟前
stella发布了新的文献求助10
2分钟前
派大星完成签到,获得积分10
2分钟前
123完成签到,获得积分10
2分钟前
林非鹿完成签到,获得积分10
2分钟前
2分钟前
2分钟前
科目三应助Augustines采纳,获得10
2分钟前
快乐的慕青完成签到,获得积分10
2分钟前
Gigi完成签到,获得积分10
2分钟前
2分钟前
动人的白凡完成签到 ,获得积分10
2分钟前
2分钟前
章鱼发布了新的文献求助50
3分钟前
高分求助中
Continuum Thermodynamics and Material Modelling 4000
Production Logging: Theoretical and Interpretive Elements 2700
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
El viaje de una vida: Memorias de María Lecea 800
Luis Lacasa - Sobre esto y aquello 700
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3516334
求助须知:如何正确求助?哪些是违规求助? 3098575
关于积分的说明 9240082
捐赠科研通 2793695
什么是DOI,文献DOI怎么找? 1533176
邀请新用户注册赠送积分活动 712599
科研通“疑难数据库(出版商)”最低求助积分说明 707384