液态液体
鉴定(生物学)
色谱法
化学
相(物质)
液相
生物系统
物理
生物
热力学
有机化学
植物
作者
Zahoor Ahmed,Kiran Shahzadi,Sebu Aboma Temesgen,Basharat Ahmad,Xiang Chen,Ning Lin,Hasan Zulfiqar,Hao Lin,Yan-Ting Jin
标识
DOI:10.1016/j.ijbiomac.2024.134146
摘要
Liquid-liquid phase separation (LLPS) regulates many biological processes including RNA metabolism, chromatin rearrangement, and signal transduction. Aberrant LLPS potentially leads to serious diseases. Therefore, the identification of the LLPS proteins is crucial. Traditionally, biochemistry-based methods for identifying LLPS proteins are costly, time-consuming, and laborious. In contrast, artificial intelligence-based approaches are fast and cost-effective and can be a better alternative to biochemistry-based methods. Previous research methods employed word2vec in conjunction with machine learning or deep learning algorithms. Although word2vec captures word semantics and relationships, it might not be effective in capturing features relevant to protein classification, like physicochemical properties, evolutionary relationships, or structural features. Additionally, other studies often focused on a limited set of features for model training, including planar π contact frequency, pi-pi, and β-pairing propensities. To overcome such shortcomings, this study first constructed a reliable dataset containing 1206 protein sequences, including 603 LLPS and 603 non-LLPS protein sequences. Then a computational model was proposed to efficiently identify the LLPS proteins by perceiving semantic information of protein sequences directly; using an ESM2-36 pre-trained model based on transformer architecture in conjunction with a convolutional neural network. The model could achieve an accuracy of 85.86 % and 89.26 %, respectively on training data and test data, surpassing the accuracy of previous studies. The performance demonstrates the potential of our computational methods as efficient alternatives for identifying LLPS proteins.
科研通智能强力驱动
Strongly Powered by AbleSci AI