可解释性
计算机科学
人工智能
机器学习
图形
稳健性(进化)
生成语法
蛋白质测序
数据挖掘
理论计算机科学
基因
肽序列
生物化学
化学
作者
Congjing Wang,Yifei Wang,Pengju Ding,Shan Li,Xu Yu,Bin Yu
标识
DOI:10.1016/j.compbiomed.2024.107944
摘要
The prediction of multi-label protein subcellular localization (SCL) is a pivotal area in bioinformatics research. Recent advancements in protein structure research have facilitated the application of graph neural networks. This paper introduces a novel approach termed ML-FGAT. The approach begins by extracting node information of proteins from sequence data, physical-chemical properties, evolutionary insights, and structural details. Subsequently, various evolutionary techniques are integrated to consolidate multi-view information. A linear discriminant analysis framework, grounded on entropy weight, is then employed to reduce the dimensionality of the merged features. To enhance the robustness of the model, the training dataset is augmented using feature-generative adversarial networks. For the primary prediction step, graph attention networks are employed to determine multi-label protein SCL, leveraging both node and neighboring information. The interpretability is enhanced by analyzing the attention weight parameters. The training is based on the Gram-positive bacteria dataset, while validation employs newly constructed datasets: human, virus, Gram-negative bacteria, plant, and SARS-CoV-2. Following a leave-one-out cross-validation procedure, ML-FGAT demonstrates noteworthy superiority in this domain.
科研通智能强力驱动
Strongly Powered by AbleSci AI