作者
Ling-Yun Luo,Hui Wu,Liming Zhao,Yahui Zhang,Jia-Hui Huang,Qiuyue Liu,Haitao Wang,Dong-Xin Mo,EEr Hehua,Lianquan Zhang,Hailiang Chen,Shangang Jia,Xiaogang Wang,Meng-Hua Li
摘要
Ongoing efforts to improve sheep reference genome assemblies still leave many gaps and incomplete regions, resulting in a few common failures and errors in sheep genomic studies. Here, we report a complete, gap-free telomere-to-telomere (T2T) genome of a ram (T2T-sheep1.0) with a size of 2.85 Gb, including all autosomes and chromosomes X and Y. It adds 220.05 Mb of previously unresolved regions (PURs) and 754 new genes to the most updated reference assembly, ARS-UI_Ramb_v3.0, and contains four types of repeat units (SatI, SatII, SatIII, and CenY) in the centromeric regions. T2T-sheep1.0 exhibits a base accuracy of >99.999%, corrects several structural errors in previous reference assemblies, and improves structural variant (SV) detection in repetitive sequences. We identified 192,265 SVs, including 16,885 new SVs in the PURs, from the PacBio long-read sequences of 18 global representative sheep. With the whole-genome short-read sequences of 810 wild and domestic sheep representing 158 global populations and seven wild species, the use of T2T-sheep1.0 as the reference genome has improved population genetic analysis based on ~133.31 million SNPs and 1,265,266 SVs, including 2,664,979 novel SNPs and 196,471 novel SVs. T2T-sheep1.0 improves selective tests by detecting several novel genes and variants, including those associated with domestication (e.g., ABCC4) and selection for the wool fineness trait (e.g., FOXQ1) in tandemly duplicated regions.