Identifying Phage Sequences From Metagenomic Data Using Deep Neural Network With Word Embedding and Attention Mechanism

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS(2023)

引用 0|浏览16
暂无评分
摘要
Phages are the functional viruses that infect bacteria and they play important roles in microbial communities and ecosystems. Phage research has attracted great attention due to the wide applications of phage therapy in treating bacterial infection in recent years. Metagenomics sequencing technique can sequence microbial communities directly from an environmental sample. Identifying phage sequences from metagenomic data is a vital step in the downstream of phage analysis. However, the existing methods for phage identification suffer from some limitations in the utilization of the phage feature for prediction, and therefore their prediction performance still need to be improved further. In this article, we propose a novel deep neural network (called MetaPhaPred) for identifying phages from metagenomic data. In MetaPhaPred, we first use a word embedding technique to encode the metagenomic sequences into word vectors, extracting the latent feature vectors of DNA words. Then, we design a deep neural network with a convolutional neural network (CNN) to capture the feature maps in sequences, and with a bi-directional long short-term memory network (Bi-LSTM) to capture the long-term dependencies between features from both forward and backward directions. The feature map consists of a set of feature patterns, each of which is the weighted feature extracted by a convolution filter with convolution kernels in the CNN slide along the input feature vectors. Next, an attention mechanism is used to enhance contributions of important features. Experimental results on both simulated and real metagenomic data with different lengths demonstrate the superiority of the proposed MetaPhaPred over the state-of-the-art methods in identifying phage sequences.
更多
查看译文
关键词
Attention mechanism,deep learning,metagenomics,phage identification,word embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要