Identifying Phage Sequences From Metagenomic Data Using Deep Neural Network With Word Embedding and Attention Mechanism

基因组人工智能特征（语言学）计算机科学卷积神经网络人工神经网络嵌入模式识别（心理学）深度学习词（群论）文字嵌入特征向量数据挖掘计算生物学生物数学遗传学基因语言学哲学几何学

作者

Lijia Ma,Wenwei Deng,Yuan Bai,Zhanwei Du,Minfeng Xiao,Lin Wang,Jianqiang Li,Asoke K. Nandi

出处

期刊：IEEE/ACM Transactions on Computational Biology and Bioinformatics [Institute of Electrical and Electronics Engineers]
日期：2023-11-01 卷期号：20 (6): 3772-3785

链接

nih.govdoi.org

标识

DOI：10.1109/tcbb.2023.3322870

摘要

Phages are the functional viruses that infect bacteria and they play important roles in microbial communities and ecosystems. Phage research has attracted great attention due to the wide applications of phage therapy in treating bacterial infection in recent years. Metagenomics sequencing technique can sequence microbial communities directly from an environmental sample. Identifying phage sequences from metagenomic data is a vital step in the downstream of phage analysis. However, the existing methods for phage identification suffer from some limitations in the utilization of the phage feature for prediction, and therefore their prediction performance still need to be improved further. In this article, we propose a novel deep neural network (called MetaPhaPred) for identifying phages from metagenomic data. In MetaPhaPred, we first use a word embedding technique to encode the metagenomic sequences into word vectors, extracting the latent feature vectors of DNA words. Then, we design a deep neural network with a convolutional neural network (CNN) to capture the feature maps in sequences, and with a bi-directional long short-term memory network (Bi-LSTM) to capture the long-term dependencies between features from both forward and backward directions. The feature map consists of a set of feature patterns, each of which is the weighted feature extracted by a convolution filter with convolution kernels in the CNN slide along the input feature vectors. Next, an attention mechanism is used to enhance contributions of important features. Experimental results on both simulated and real metagenomic data with different lengths demonstrate the superiority of the proposed MetaPhaPred over the state-of-the-art methods in identifying phage sequences.

求助该文献

最长约 10秒，即可获得该文献文件

Identifying Phage Sequences From Metagenomic Data Using Deep Neural Network With Word Embedding and Attention Mechanism

今日热心研友