生物
基因组
基因组
计算生物学
纤毛的
遗传学
基因
作者
Chuanqi Jiang,Guangying Wang,Jing Zhang,Siyu Gu,Xueyan Wang,Weiwei Qin,Kai Chen,Dan Yuan,Xiaocui Chai,Mingkun Yang,Fang Zhou,Jie Xiong,Wei Miao
标识
DOI:10.1111/1755-0998.13782
摘要
Ciliates are a large group of ubiquitous and highly diverse single-celled eukaryotes that play an essential role in the functioning of microbial food webs. However, their genomic diversity is far from clear due to the need to develop cultivation methods for most species, so most research is based on wild organisms that almost invariably contain contaminants. Here we establish an integrated Genome Decontamination Pipeline (iGDP) that combines homology search, telomere reads-assisted and clustering approaches to filter contaminated ciliate genome assemblies from wild specimens. We benchmarked the performance of iGDP using genomic data from a contaminated ciliate culture and the results showed that iGDP could recall 91.9% of the target sequences with 96.9% precision. We also used a synthetic dataset to offer guidelines for the application of iGDP in the removal of various groups of contaminants. Compared with several popular metagenome binning tools, iGDP could show better performance. To further validate the effectiveness of iGDP on real-world data, we applied it to decontaminate genome assemblies of three wild ciliate specimens and obtained their genomes with high quality comparable to that of previously well-studied model ciliate genomes. It is anticipated that the newly generated genomes and the established iGDP method will be valuable community resources for detailed studies on ciliate biodiversity, phylogeny, ecology and evolution. The pipeline (https://github.com/GWang2022/iGDP) can be implemented automatically to reduce manual filtering and classification and may be further developed to apply to other microeukaryotes.
科研通智能强力驱动
Strongly Powered by AbleSci AI