马修斯相关系数
自编码
编码器
变压器
水准点(测量)
致病性
机器学习
人工神经网络
模式识别(心理学)
生物
支持向量机
人工智能
计算机科学
工程类
微生物学
大地测量学
电压
地理
电气工程
操作系统
作者
Zihao Yan,Fang Ge,Yan Liu,Yumeng Zhang,Fuyi Li,Jiangning Song,Dong‐Jun Yu
标识
DOI:10.1021/acs.jcim.3c02019
摘要
Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.
科研通智能强力驱动
Strongly Powered by AbleSci AI