已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure

计算生物学 康蒂格 基因 生物 遗传学 进化生物学 基因组
作者
Lotte J. U. Pronk,Marnix H. Medema
出处
期刊:Microbial genomics [Microbiology Society]
卷期号:8 (5) 被引量:31
标识
DOI:10.1099/mgen.0.000823
摘要

Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote-specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https://github.com/LottePronk/whokaryote.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
pppabo发布了新的文献求助10
刚刚
慕青应助刻苦丝袜采纳,获得10
刚刚
烟花应助Astraeus采纳,获得10
刚刚
孙淳发布了新的文献求助10
2秒前
FadedTulips完成签到 ,获得积分10
2秒前
彭于晏应助哭泣的凝芙采纳,获得10
3秒前
李娇完成签到 ,获得积分10
4秒前
大个应助李晨源采纳,获得10
5秒前
8秒前
李健的粉丝团团长应助xmh采纳,获得10
10秒前
CodeCraft应助大壮采纳,获得10
11秒前
yyds完成签到,获得积分10
11秒前
Astraeus发布了新的文献求助10
13秒前
背后曼雁完成签到 ,获得积分10
14秒前
14秒前
15秒前
OK应助kediy采纳,获得30
16秒前
6666完成签到,获得积分10
18秒前
xmh完成签到,获得积分20
19秒前
Astraeus完成签到,获得积分10
19秒前
20秒前
火星仙人掌完成签到 ,获得积分10
21秒前
檸123456发布了新的文献求助10
21秒前
23秒前
彭于晏应助微笑爆米花采纳,获得10
23秒前
心行完成签到 ,获得积分10
23秒前
wangjue完成签到,获得积分10
24秒前
紫菱发布了新的文献求助10
25秒前
26秒前
fafamimireredo完成签到,获得积分10
26秒前
灰灰完成签到 ,获得积分10
27秒前
古菇顾完成签到 ,获得积分10
30秒前
31秒前
obsession完成签到 ,获得积分10
31秒前
苏子瞻发布了新的文献求助10
32秒前
时尚的初柔完成签到,获得积分10
33秒前
檸123456完成签到,获得积分10
36秒前
Shaun完成签到,获得积分10
36秒前
Yolo完成签到 ,获得积分10
36秒前
CodeCraft应助孙淳采纳,获得10
37秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Cold War Transcended: Australia's China Policy, 1949-1990 998
Signals, Systems, and Signal Processing 610
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
Testimonial Injustice and Trust 510
Fundamentals of Body MRI 3rd Edition 400
The Wiley Blackwell Companion to Diachronic and Historical Linguistics 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6631117
求助须知:如何正确求助?哪些是违规求助? 8391742
关于积分的说明 17950224
捐赠科研通 5811222
什么是DOI,文献DOI怎么找? 2964766
邀请新用户注册赠送积分活动 1939886
关于科研通互助平台的介绍 1850796