多倍体
基因组
倍性
杂合子丢失
生物
计算生物学
仿形(计算机编程)
基因组大小
遗传学
计算机科学
基因
等位基因
操作系统
作者
T. Rhyker Ranallo-Benavidez,Kamil S. Jaroň,Michael C. Schatz
标识
DOI:10.1038/s41467-020-14998-3
摘要
Abstract An important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 ( https://github.com/tbenavi1/genomescope2.0 ), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe and evaluate a practical implementation of the polyploid-aware mixture model that quickly and accurately infers genome properties across thousands of simulated and several real datasets spanning a broad range of complexity. We also present a method called Smudgeplot ( https://github.com/KamilSJaron/smudgeplot ) to visualize and estimate the ploidy and genome structure of a genome by analyzing heterozygous k-mer pairs. We successfully apply the approach to systems of known variable ploidy levels in the Meloidogyne genus and the extreme case of octoploid Fragaria × ananassa .
科研通智能强力驱动
Strongly Powered by AbleSci AI