BUSCO: Assessing Genomic Data Quality and Beyond

基因组 计算机科学 工作流程 计算生物学 基因组学 生物 基因 遗传学 数据库
作者
Mosè Manni,Matthew Berkeley,Mathieu Seppey,Evgeny M. Zdobnov
出处
期刊:Current protocols [Wiley]
卷期号:1 (12) 被引量:716
标识
DOI:10.1002/cpz1.323
摘要

Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
doudou发布了新的文献求助10
刚刚
顾矜应助ztr采纳,获得30
1秒前
666发布了新的文献求助10
2秒前
wild发布了新的文献求助10
4秒前
Jasper应助刘一三采纳,获得10
5秒前
JJ发布了新的文献求助10
5秒前
5秒前
科研天才完成签到,获得积分10
6秒前
6秒前
doudou完成签到,获得积分10
6秒前
一一应助难过小懒虫采纳,获得10
6秒前
ztr完成签到,获得积分20
7秒前
谦让小丸子完成签到,获得积分10
8秒前
子陇发布了新的文献求助10
9秒前
好好好完成签到,获得积分10
9秒前
9秒前
10秒前
盘尼西林发布了新的文献求助10
10秒前
11秒前
JamesPei应助欢呼的井采纳,获得10
12秒前
丷橘南应助wxnice采纳,获得10
12秒前
向卉完成签到,获得积分10
14秒前
Akim应助迪巴拉采纳,获得10
14秒前
太阳花发布了新的文献求助10
15秒前
123完成签到,获得积分10
15秒前
yunfengwang完成签到,获得积分10
15秒前
银杏发布了新的文献求助10
16秒前
18秒前
淞33完成签到 ,获得积分10
19秒前
20秒前
小马甲应助cy5982采纳,获得10
21秒前
zho应助现代的妍采纳,获得10
21秒前
22秒前
kilig发布了新的文献求助10
23秒前
24秒前
大模型应助科研的狗采纳,获得10
25秒前
我是老大应助子陇采纳,获得10
25秒前
zho应助ZZ采纳,获得10
25秒前
中西西完成签到 ,获得积分10
25秒前
26秒前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
Production Logging: Theoretical and Interpretive Elements 3000
The Finite Element Method Its Basis and Fundamentals 2000
J'AI COMBATTU POUR MAO // ANNA WANG 660
Izeltabart tapatansine - AdisInsight 600
Introduction to Comparative Public Administration Administrative Systems and Reforms in Europe, Third Edition 3rd edition 500
Geotechnical characterization of slope movements 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3752811
求助须知:如何正确求助?哪些是违规求助? 3296371
关于积分的说明 10093570
捐赠科研通 3011229
什么是DOI,文献DOI怎么找? 1653678
邀请新用户注册赠送积分活动 788339
科研通“疑难数据库(出版商)”最低求助积分说明 752809