基因组
计算生物学
基因预测
基因
真核生物
注释
生物
基因组
基因组计划
计算机科学
遗传学
作者
Christopher J. Neely,Sarah K. Hu,Harriet Alexander,Benjamin J. Tully
标识
DOI:10.1101/2021.07.25.453296
摘要
Abstract Gene prediction and annotation for eukaryotic genomes is challenging with large data demands and complex computational requirements. For most eukaryotes, genomes are recovered from specific target taxa. However, it is now feasible to reconstruct or sequence hundreds of metagenome-assembled genomes (MAGs) or single-amplified genomes directly from the environment. To meet this forth-coming wave of eukaryotic genome generation, we introduce EukMetaSanity, which combines state-of-the-art tools into three pipelines that have been specifically designed for extensive parallelization on high-performance computing infrastructure. EukMetaSanity performs an automated taxonomy search against a protein database of 1,482 species to identify phylogenetically compatible proteins to be used in downstream gene prediction. We present the results for intron, exon, and gene locus prediction for 112 genomes collected from NCBI, including fungi, plants, and animals, along with 1,669 MAGs and demonstrate that EukMetaSanity can provide reliable preliminary gene predictions for a single target taxon or at scale for hundreds of MAGs. EukMetaSanity is freely available at https://github.com/cjneely10/EukMetaSanity .
科研通智能强力驱动
Strongly Powered by AbleSci AI