Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure

计算生物学 康蒂格 基因 生物 遗传学 进化生物学 基因组
作者
Lotte J. U. Pronk,Marnix H. Medema
出处
期刊:Microbial genomics [Microbiology Society]
卷期号:8 (5) 被引量:31
标识
DOI:10.1099/mgen.0.000823
摘要

Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote-specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https://github.com/LottePronk/whokaryote.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
李健的粉丝团团长应助hwj采纳,获得10
刚刚
QAQ完成签到,获得积分10
刚刚
刚刚
木子完成签到,获得积分10
1秒前
六六发布了新的文献求助10
1秒前
DHY发布了新的文献求助10
2秒前
2秒前
2秒前
麦子应助孙立采纳,获得10
2秒前
3秒前
3秒前
忧伤的擎完成签到,获得积分20
3秒前
3秒前
上官若男应助111采纳,获得10
3秒前
123345完成签到,获得积分10
4秒前
4秒前
CC完成签到,获得积分10
4秒前
但星火永不坠落完成签到,获得积分10
5秒前
超级的珍珍完成签到,获得积分20
5秒前
天天快乐应助Sarah采纳,获得10
5秒前
Dolo_Duan发布了新的文献求助10
5秒前
Xun完成签到,获得积分10
6秒前
6秒前
HuiJN完成签到 ,获得积分10
6秒前
英俊的铭应助xumq采纳,获得10
6秒前
杨好圆发布了新的文献求助10
6秒前
111发布了新的文献求助10
6秒前
春年完成签到,获得积分10
6秒前
7秒前
宝坤发布了新的文献求助10
7秒前
爆米花应助迅速的千风采纳,获得10
7秒前
危机的安容完成签到,获得积分10
7秒前
中锅人发布了新的文献求助10
7秒前
7秒前
ZQJ2001KYT发布了新的文献求助10
8秒前
科研通AI6.2应助华年采纳,获得10
8秒前
zhangyiyang完成签到,获得积分10
8秒前
缓慢耳机完成签到,获得积分10
8秒前
马荣应助华年采纳,获得20
8秒前
徐涵完成签到 ,获得积分10
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Inorganic Chemistry Eighth Edition 1200
Free parameter models in liquid scintillation counting 1000
Standards for Molecular Testing for Red Cell, Platelet, and Neutrophil Antigens, 7th edition 1000
HANDBOOK OF CHEMISTRY AND PHYSICS 106th edition 1000
ASPEN Adult Nutrition Support Core Curriculum, Fourth Edition 1000
The Organic Chemistry of Biological Pathways Second Edition 800
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6308685
求助须知:如何正确求助?哪些是违规求助? 8124894
关于积分的说明 17020467
捐赠科研通 5365952
什么是DOI,文献DOI怎么找? 2849649
邀请新用户注册赠送积分活动 1827435
关于科研通互助平台的介绍 1680448