卷积神经网络
转座因子
标杆管理
计算机科学
基因组
人工智能
树(集合论)
人工神经网络
模式识别(心理学)
计算生物学
机器学习
生物
遗传学
数学
基因
数学分析
业务
营销
作者
Haidong Yan,Aureliano Bombarely,Song Li
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2020-05-12
卷期号:36 (15): 4269-4275
被引量:92
标识
DOI:10.1093/bioinformatics/btaa519
摘要
Abstract Motivation Transposable elements (TEs) classification is an essential step to decode their roles in genome evolution. With a large number of genomes from non-model species becoming available, accurate and efficient TE classification has emerged as a new challenge in genomic sequence analysis. Results We developed a novel tool, DeepTE, which classifies unknown TEs using convolutional neural networks (CNNs). DeepTE transferred sequences into input vectors based on k-mer counts. A tree structured classification process was used where eight models were trained to classify TEs into super families and orders. DeepTE also detected domains inside TEs to correct false classification. An additional model was trained to distinguish between non-TEs and TEs in plants. Given unclassified TEs of different species, DeepTE can classify TEs into seven orders, which include 15, 24 and 16 super families in plants, metazoans and fungi, respectively. In several benchmarking tests, DeepTE outperformed other existing tools for TE classification. In conclusion, DeepTE successfully leverages CNN for TE classification, and can be used to precisely classify TEs in newly sequenced eukaryotic genomes. Availability and implementation DeepTE is accessible at https://github.com/LiLabAtVT/DeepTE. Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI