计算机科学
判别式
同源建模
同源(生物学)
持久同源性
计算生物学
结构母题
人工智能
序列母题
蛋白质结构
模式识别(心理学)
生物
遗传学
算法
基因
生物化学
酶
作者
Jiangyi Shao,Qi Zhang,Ke Yan,Yihe Pang
摘要
Abstract Protein remote homology detection is essential for structure prediction, function prediction, disease mechanism understanding, etc. The remote homology relationship depends on multiple protein properties, such as structural information and local sequence patterns. Previous studies have shown the challenges for predicting remote homology relationship by protein features at sequence level (e.g. position-specific score matrix). Protein motifs have been used in structure and function analysis due to their unique sequence patterns and implied structural information. Therefore, designing a usable architecture to fuse multiple protein properties based on motifs is urgently needed to improve protein remote homology detection performance. To make full use of the characteristics of motifs, we employed the language model called the protein cubic language model (PCLM). It combines multiple properties by constructing a motif-based neural network. Based on the PCLM, we proposed a predictor called PreHom-PCLM by extracting and fusing multiple motif features for protein remote homology detection. PreHom-PCLM outperforms the other state-of-the-art methods on the test set and independent test set. Experimental results further prove the effectiveness of multiple features fused by PreHom-PCLM for remote homology detection. Furthermore, the protein features derived from the PreHom-PCLM show strong discriminative power for proteins from different structural classes in the high-dimensional space. Availability and Implementation: http://bliulab.net/PreHom-PCLM.
科研通智能强力驱动
Strongly Powered by AbleSci AI