基因组
发起人
人工智能
计算机科学
卷积神经网络
计算生物学
基因组学
深度学习
比较基因组学
计算基因组学
特征(语言学)
稳健性(进化)
DNA测序
DNA微阵列
基因
生物
遗传学
基因表达
哲学
语言学
作者
Chen Wang,Junyin Zhang,Li Cheng,Jiawei Wu,Minfeng Xiao,Junfeng Xia,Yannan Bin
标识
DOI:10.1109/jbhi.2022.3193224
摘要
With the number of phage genomes increasing, it is urgent to develop new bioinformatics methods for phage genome annotation. Promoter, a DNA region, is important for gene transcriptional regulation. In the era of post-genomics, the availability of data makes it possible to establish computational models for promoter identification with robustness. In this work, we introduce DPProm, a two-layer model composed of DPProm-1L and DPProm-2L, to predict promoters and their types for phages. On the first layer, as a dual-channel deep neural network ensemble method fusing multi-view features (sequence feature and handcrafted feature), the model DPProm-1L is proposed to identify whether a DNA sequence is a promoter or non-promoter. The sequence feature is extracted with convolutional neural network (CNN). And the handcrafted feature is the combination of free energy, GC content, cumulative skew, and Z curve features. On the second layer, DPProm-2L based on CNN is trained to predict the promoters' types (host or phage). For the realization of prediction on the whole genomes, the model DPProm, combines with a novel sequence data processing workflow, which contains sliding window and merging sequences modules. Experimental results show that DPProm outperforms the state-of-the-art methods, and decreases the false positive rate effectively on whole genome prediction. Furthermore, we provide a user-friendly web at http://bioinfo.ahu.edu.cn/DPProm. We expect that DPProm can serve as a useful tool for identification of promoters and their types.
科研通智能强力驱动
Strongly Powered by AbleSci AI