作者
Yu Deng,Yuting Qian,Minghui Meng,Haifeng Jiang,Yang Dong,Chengchi Fang,Shunping He,Liandong Yang
摘要
The zebrafish (Danio rerio) is one of the most widely used model organisms for studying vertebrate gene function and human disease, given the 70% conserved protein-coding genes between zebrafish and human. Two of the most common laboratory zebrafish strains are Tuebingen and AB. Despite the fact that the zebrafish reference genome is derived from the Tuebingen strain, the AB strain is still widely used although a high-quality genome comparable to Tuebingen is lacking. Here, we report a 1.40-Gb representative de novo genome assembly of the AB strain (DrAB1), with contig N50 length of 21 Mb, by integrating Illumina short-read sequencing, Nanopore long-read sequencing and HiC-based chromatin mapping. Compared with the published zebrafish Zv11 reference genome (GRCz11), this genome assembly shows considerable improvements in both contiguity and completeness. In addition, substantial structural differences and extensive sequence divergence of unprecedented resolution have been uncovered, especially with respect to 9,029,929 single nucleotide polymorphisms, 2,376,812 InDels, 32,623 insertions, 22,089 deletions and 220 inversions, which constitute ~2.6% of the DrAB1 genome. Many of these variants may have potential functional effects on phenotype, which should be considered in further experimental designs. Consequently, our study provides additional genomic resources and a high-resolution structural variation map based on whole-genome alignment for the zebrafish community, which could also be an indispensable reference genome from a model species in future research on fish phylogenetic genomics, comparative genomics and adaptive evolution.