深度学习
卷积神经网络
航程(航空)
计算机科学
人工智能
功能(生物学)
序列(生物学)
符号
膨胀(度量空间)
算法
模式识别(心理学)
数学
算术
组合数学
生物
工程类
航空航天工程
进化生物学
遗传学
作者
Vikash Kumar,A. Deepak,Ashish Ranjan,Aravind Prakash
标识
DOI:10.1109/tcbb.2023.3240169
摘要
The short-and-long range interactions amongst amino-acids in a protein sequence are primarily responsible for the function performed by the protein. Recently convolutional neural network (CNN)s have produced promising results on sequential data including those of NLP tasks and protein sequences. However, CNN's strength primarily lies at capturing short range interactions and are not so good at long range interactions. On the other hand, dilated CNNs are good at capturing both short-and-long range interactions because of varied – short-and-long – receptive fields. Further, CNNs are quite light-weight in terms of trainable parameters, whereas most existing deep learning solutions for protein function prediction (PFP) are based on multi-modality and are rather complex and heavily parametrized. In this paper, we propose a (sub-sequence + dilated -CNNs)-based simple, light-weight and sequence-only PFP framework Lite-SeqCNN . By varying dilation-rates , Lite-SeqCNN efficiently captures both short-and-long range interactions and has (0.50–0.75 times) fewer trainable parameters than its contemporary deep learning models. Further, Lite-SeqCNN $^+$ is an ensemble of three Lite-SeqCNN s developed with different segment-sizes that produces even better results compared to the individual models. The proposed architecture produced improvements upto 5% over state-of-the-art approaches Global-ProtEnc Plus , DeepGOPlus , and GOLabeler on three different prominent datasets curated from the UniProt database.
科研通智能强力驱动
Strongly Powered by AbleSci AI