计算机科学
基因本体论
蛋白质功能预测
本体论
人工神经网络
人工智能
图形
DNA测序
机器学习
数据挖掘
作者
Chenguang Zhao,Tong Liu,Zheng Wang
标识
DOI:10.1093/nargab/lqac004
摘要
Abstract High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. Therefore, accurate and fast computational alternatives are needed to infer functional knowledge from protein sequences. The gene ontology (GO) directed acyclic graph (DAG) contains the hierarchical relationships between GO terms but is hard to be integrated into machine learning algorithms for functional predictions. We developed a deep learning system named PANDA2 to predict protein functions, which used the cutting-edge graph neural network to model the topology of the GO DAG and integrated the features generated by transformer protein language models. Compared with the top 10 methods in CAFA3, PANDA2 ranked first in cellular component ontology (CCO), tied first in biological process ontology (BPO) but had a higher coverage rate, and second in molecular function ontology (MFO). Compared with other recently-developed cutting-edge predictors DeepGOPlus, GOLabeler, and DeepText2GO, and benchmarked on another independent dataset, PANDA2 ranked first in CCO, first in BPO, and second in MFO. PANDA2 can be freely accessed from http://dna.cs.miami.edu/PANDA2/.
科研通智能强力驱动
Strongly Powered by AbleSci AI