卷积神经网络
计算机科学
机器学习
功能(生物学)
人工智能
蛋白质功能
计算生物学
生物
生物化学
基因
进化生物学
作者
Zhenjiao Du,Yixiang Xu,Changqi Liu,Yonghui Li
标识
DOI:10.1021/acs.jafc.3c07143
摘要
The rising prevalence of allergy demands efficient and accurate bioinformatic tools to expedite allergen identification and risk assessment while also reducing wet experiment expenses and time. Recently, pretrained protein language models (pLMs) have successfully predicted protein structure and function. However, to our best knowledge, they have not been used for predicting allergenic proteins/peptides. Therefore, this study aims to develop robust models for allergenic protein/peptide prediction using five pLMs of varying sizes and systematically assess their performance through fine-tuning with a convolutional neural network. The developed pLM4Alg models have achieved state-of-the-art performance with accuracy, Matthews correlation coefficient, and area under the curve scoring 93.4–95.1%, 0.869–0.902, and 0.981–0.990, respectively. Moreover, pLM4Alg is the first model capable of handling prediction tasks involving residue-missed sequences and sequences containing nonstandard amino acid residues. To facilitate easy access, a user-friendly web server (https://f6wxpfd3sh.us-east-1.awsapprunner.com) has been established. pLM4Alg is expected to become the leading machine learning-based prediction model for allergenic peptides and proteins. Its collaboration with other predictors holds great promise for accelerating allergy research.
科研通智能强力驱动
Strongly Powered by AbleSci AI