基因组
人工智能
特征(语言学)
计算机科学
卷积神经网络
人工神经网络
嵌入
模式识别(心理学)
深度学习
词(群论)
文字嵌入
特征向量
数据挖掘
计算生物学
生物
数学
遗传学
基因
语言学
哲学
几何学
作者
Lijia Ma,Wenwei Deng,Yuan Bai,Zhanwei Du,Minfeng Xiao,Lin Wang,Jianqiang Li,Asoke K. Nandi
标识
DOI:10.1109/tcbb.2023.3322870
摘要
Phages are the functional viruses that infect bacteria and they play important roles in microbial communities and ecosystems. Phage research has attracted great attention due to the wide applications of phage therapy in treating bacterial infection in recent years. Metagenomics sequencing technique can sequence microbial communities directly from an environmental sample. Identifying phage sequences from metagenomic data is a vital step in the downstream of phage analysis. However, the existing methods for phage identification suffer from some limitations in the utilization of the phage feature for prediction, and therefore their prediction performance still need to be improved further. In this article, we propose a novel deep neural network (called MetaPhaPred) for identifying phages from metagenomic data. In MetaPhaPred, we first use a word embedding technique to encode the metagenomic sequences into word vectors, extracting the latent feature vectors of DNA words. Then, we design a deep neural network with a convolutional neural network (CNN) to capture the feature maps in sequences, and with a bi-directional long short-term memory network (Bi-LSTM) to capture the long-term dependencies between features from both forward and backward directions. The feature map consists of a set of feature patterns, each of which is the weighted feature extracted by a convolution filter with convolution kernels in the CNN slide along the input feature vectors. Next, an attention mechanism is used to enhance contributions of important features. Experimental results on both simulated and real metagenomic data with different lengths demonstrate the superiority of the proposed MetaPhaPred over the state-of-the-art methods in identifying phage sequences.
科研通智能强力驱动
Strongly Powered by AbleSci AI