The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions

生物 基因组 顺序装配 串联 基因 计算生物学 遗传学 生物技术 进化生物学 基因表达 复合材料 转录组 材料科学
作者
Xiaohui Yang,Lingkui Zhang,Xiao Guo,Jianfei Xu,Kang Zhang,Yinqing Yang,Yang Yu,Yinqiao Jian,Daofeng Dong,Sanwen Huang,Cheng Feng,Guangcun Li
出处
期刊:Molecular Plant [Elsevier]
卷期号:16 (2): 314-317 被引量:20
标识
DOI:10.1016/j.molp.2022.12.010
摘要

Potato is a vital food security crop and is ranked as the world’s third most important food crop after rice and wheat. In 2011, the first genome assembly of a doubled monoploid potato DM1-3 516 R44 (DM) was released (Potato Genome Sequencing Consortium, 2011Potato Genome Sequencing ConsortiumGenome sequence and analysis of the tuber crop potato.Nature. 2011; 475: 189-195Crossref PubMed Scopus (1511) Google Scholar), which has been widely used as one of the most popular reference genomes in the last decade and served as a valuable resource in plant genomics and potato genetics community (Leisner et al., 2018Leisner C.P. Hamilton J.P. Crisovan E. Manrique-Carpintero N.C. Marand A.P. Newton L. Pham G.M. Jiang J. Douches D.S. Jansky S.H. et al.Genome sequence of M6, a diploid inbred clone of the high-glycoalkaloid-producing tuber-bearing potato species Solanum chacoense, reveals residual heterozygosity.Plant J. 2018; 94: 562-570Crossref PubMed Scopus (91) Google Scholar; Yang et al., 2020Yang X. Yang Y. Ling J. Guan J. Guo X. Dong D. Jin L. Huang S. Liu J. Li G. A high-throughput BAC end analysis protocol (BAC-anchor) for profiling genome assembly and physical mapping.Plant Biotechnol. J. 2020; 18: 364-372Crossref PubMed Scopus (3) Google Scholar; Zheng et al., 2020Zheng J. Yang Y. Guo X. Jin L. Xiong X. Yang X. Li G. Exogenous SA initiated defense response and multi-signaling pathway in tetraploid potato SD20.Horticultural Plant Journal. 2020; 6: 99-110Crossref Scopus (16) Google Scholar). The latest version of DM genome assembly (v6.1) (Pham et al., 2020Pham G.M. Hamilton J.P. Wood J.C. Burke J.T. Zhao H. Vaillancourt B. Ou S. Jiang J. Buell C.R. Construction of a chromosome-scale long-read reference genome assembly for potato.GigaScience. 2020; 9: giaa100-giaa111Crossref PubMed Scopus (96) Google Scholar) served as a good reference and quality control in studies of diploid and tetraploid potatoes (Zhou et al., 2020Zhou Q. Tang D. Huang W. Yang Z. Zhang Y. Hamilton J.P. Visser R.G.F. Bachem C.W.B. Robin Buell C. Zhang Z. et al.Haplotype-resolved genome analyses of a heterozygous diploid potato.Nat. Genet. 2020; 52: 1018-1023Crossref PubMed Scopus (96) Google Scholar; Bao et al., 2022Bao Z. Li C. Li G. Wang P. Peng Z. Cheng L. Li H. Zhang Z. Li Y. Huang W. et al.Genome architecture and tetrasomic inheritance of autotetraploid potato.Mol. Plant. 2022; 15: 1211-1226Abstract Full Text Full Text PDF PubMed Scopus (11) Google Scholar; Hoopes et al., 2022Hoopes G. Meng X. Hamilton J.P. Achakkagari S.R. de Alves Freitas Guesdes F. Bolger M.E. Coombs J.J. Esselink D. Kaiser N.R. Kodde L. et al.Phased, chromosome-scale genome assemblies of tetraploid potato reveal a complex genome, transcriptome, and predicted proteome landscape underpinning genetic diversity.Mol. Plant. 2022; 15: 520-536Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar; Sun et al., 2022Sun H. Jiao W.B. Krause K. Campoy J.A. Goel M. Folz-Donahue K. Kukat C. Huettel B. Schneeberger K. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar.Nat. Genet. 2022; 54: 342-348Crossref PubMed Scopus (45) Google Scholar; Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar). However, 161 gaps remain in DM6.1 (v6.1), and the centromere and telomere structures are incomplete. Considering the importance of the DM genome in potato genomics, genetics, and breeding studies, generating a complete genome assembly of DM is of great importance. In this study, a telomere-to-telomere gap-free genome of DM (DM8.1) (Figure 1A) was assembled through combining Oxford Nanopore Technologies (ONT) ultra-long reads sequencing (119.81× coverage) and Hi-C sequencing (130.57×) (Supplemental Table 1), as well as being assisted by multiple gap-closing strategies coupled with high fidelity (HIFI) reads from circular consensus sequencing. A total of 179 contigs with a summed size of 773.36 Mb and a contig N50 of 59.72 Mb were obtained after initial genome assembly, polishing, and decontamination. Hi-C reads further anchored 37 of the 179 contigs into 12 chromosomes (Supplemental Figure 1; Supplemental Table 2), accounting for 95.53% (738.82 Mb) of the total assembly, and we named it preDM8. For the 142 (34.53 Mb) unanchored contigs, over 98% are short sequences (<1 Mb), and all could be aligned to chromosomes with high similarity, indicating that these were repetitive or redundant sequences. The preDM8 has better contiguous sequences than DM6.1 and the potato pan-genome assemblies (Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar) (Supplemental Figure 2). However, there were 25 gaps in preDM8. Three methods were further adopted to close these gaps (Supplemental Figure 3A; Supplemental Table 3). First, we aligned the ONT reads to preDM8, and reads mapped on the flanking regions of gaps were collected and assembled, which successfully closed 14 gaps. Second, based on the syntenic homologous fragments between preDM8 and DM6.1, three gaps were closed with the DM6.1 consecutive sequences that covered these gaps in preDM8. Third, target sequences amplification experiments (Supplemental Figure 3B) and HIFI sequencing were performed, which successfully closed the remaining eight gaps (Supplemental Figures 3C and 4). Finally, we generated the gap-free genome assembly of DM and named it DM8.1 (Figure 1A; Supplemental Table 4). To verify the quality of the gap-free genome, we investigated the reliability of these sequences in DM8.1 that corresponded to the 161 gaps in DM6.1. We randomly selected 50 of the 161 gaps and designed 100 pairs of primers (Supplemental Table 5) based on sequences on both sides of these closed gaps for PCR amplification (Supplemental Figure 5) and Sanger sequencing. Both the 5′ and 3′ boundary sequences of these gaps were successfully obtained, which indicated the high accuracy of DM8.1. Meanwhile, DM8.1 genome achieved a BUSCO value of 98.70%, an extremely high mapping rate (>99.90%) of both Illumina short reads and ONT long reads; a high consensus quality value (35.85) obtained by Merqury analysis; and improvement in long terminal repeat (LTR)-retrotransposon completeness (DM8.1: LAI = 12.92, LTR length = 388.58 Mb; DM6.1: LAI = 12.75, LTR length = 375.91 Mb), further supporting the high quality of DM8.1 (Supplemental Tables 6 and 7). A total of 40 155 protein-coding genes were predicted in DM8.1 (Supplemental Figure 6), among which 33 972 (84.60%) were functionally annotated and 24 362 genes were expressed, estimated by the 10 mRNA sequencing datasets. Further analysis found that there were 1117 genes in DM8.1 that were mis-annotated in DM6.1 in that one gene was incorrectly annotated as two. These errors were revealed by individual read pairs (mRNA sequencing) covering and linking two mis-annotated neighbor genes, suggesting that they were from a transcript of one gene (Supplemental Figure 7). Meanwhile, a total of 956 349 transposable elements (TEs) were predicted, accounting for 60.31% (465.81 Mb) of the DM8.1 genome (Supplemental Figure 8; Supplemental Table 8). Additionally, there were 4676 small RNAs predicted in DM8.1 (Supplemental Figure 9). All telomere regions were detected in DM8.1 using the seven-base telomeric repeat and sub-telomeric repeats of CL14 and CL34, and all centromere regions were identified using CENH3 (Figure 1A). Sequence composition analysis showed that the centromere regions contained more Gypsy-type LTRs (49.25%), while the telomere regions harbored more unknown TEs (Supplemental Figure 8). Additionally, the filled sequences in these 25 gaps showed similar TE contents to the centromere regions (Supplemental Figure 8). The complete genome assembly of DM8.1 facilitated the identification of large tandem gene clusters of functional importance. A total of 181 genes were identified in these newly assembled sequences, corresponding to the 161 gap regions in DM6.1. Of these 181 genes, three large clusters (>15 copies) of tandem duplicated genes were found, including 21 patatin genes (Figures 1B), 31 terpene synthase genes, and 18 cupin genes (Supplemental Figure 10). Among them, the 21 patatin genes showed much higher expression levels in tubers than in other organs of potato (Figure 1C). Intriguingly, patatin was found to be under absolute dosage selection, because it has been continuously expanded during the evolution, domestication, and breeding improvement of potato (Figures 1D–1E). In family Solanaceae, we found that patatin was only largely expanded in potato and a bit expanded in wolfberry (seven copies) while keeping three or fewer copies in others or was even completely lost in Physalis and tobacco (Figure 1E). Additionally, Etuberosum, which is a sister group of potato, has four and five copies of patatin in the two assembled Etuberosum genomes (Figure 1D). This indicates that expansion of patatin gene copies is associated with the speciation of potato, which may play an important role in the formation of enlarged tubers in potato. Furthermore, in the reported pan-genomes of tomato and potato (Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar; Zhou et al., 2022Zhou Y. Zhang Z. Bao Z. Li H. Lyu Y. Zan Y. Wu Y. Cheng L. Fang Y. Wu K. et al.Graph pangenome captures missing heritability and empowers tomato breeding.Nature. 2022; 606: 527-534Crossref PubMed Scopus (60) Google Scholar), we found that the locus of patatin maintained only one or two gene copies in the tomato population but was expanded continuously and significantly in the potato population from the diploid wild potato, diploid S. candolleanum, to the diploid landraces of potato, with the average copy number growing from 5.9 and 7 to 14.6, respectively (Figure 1D), clearly indicating the expansion of patatin during the domestication of potato. Moreover, these expanded patatin genes were under strong positive selection (Ka/Ks > 1), especially in these domesticated potato genomes (Supplemental Figure 11), indicating the functional differentiation of patatin after gene copy expansion, which may associate with the development, production, and quality improvement of potato tubers. These findings together suggest that it is possible to breed potato cultivars of higher yields and quality through manipulating the absolute dosage, i.e., the gene copy number or the expression level, of patatin. There have been continuous efforts to improve the reference genome of DM, which is important for both scientific research and breeding programs of potato. In this study, we have generated the gap-free telomere-to-telomere genome assembly of DM8.1, which could serve as an important resource for future genomics and gene function studies in potato. This work was supported by the National Natural Science Foundation of China (32072119 and 31801421); the Breeding Program of Shandong Province, China (2020LZGC003); the National Agriculture Science and Technology Major Program, China (NK20220904); the China Agricultural Research System (CARS-9); the Central Public-interest Scientific Institution Basal Research Fund (Y2022PT23); and the Innovation Program of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-IVFCAAS).
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
seven发布了新的文献求助10
1秒前
1秒前
顾矜应助雷寒云采纳,获得10
2秒前
2秒前
三十三发布了新的文献求助10
2秒前
积极睫毛发布了新的文献求助10
2秒前
3秒前
ding应助Dido采纳,获得10
3秒前
炙热友安完成签到,获得积分10
5秒前
科研通AI5应助小小飞xxf采纳,获得10
5秒前
香蕉觅云应助蔡蔡不菜菜采纳,获得10
5秒前
lin完成签到,获得积分20
7秒前
Laus发布了新的文献求助10
7秒前
kk发布了新的文献求助10
7秒前
PP完成签到,获得积分20
8秒前
8秒前
suibianba应助善良夜梅采纳,获得10
8秒前
天天快乐应助高兴发箍采纳,获得30
9秒前
莎莎完成签到 ,获得积分10
9秒前
寒冷十三完成签到 ,获得积分10
10秒前
11秒前
zzx完成签到,获得积分10
12秒前
绿波电龙完成签到,获得积分10
12秒前
qzj发布了新的文献求助10
12秒前
程翠丝发布了新的文献求助10
12秒前
广泛的发布了新的文献求助10
13秒前
13秒前
13秒前
13秒前
小二郎应助芝士芝士采纳,获得10
15秒前
晏晏完成签到 ,获得积分10
15秒前
ano发布了新的文献求助10
16秒前
yuzu完成签到,获得积分10
16秒前
17秒前
小郭同学发布了新的文献求助10
17秒前
不是个麻瓜完成签到,获得积分20
18秒前
传奇3应助yqhide采纳,获得10
18秒前
lalala发布了新的文献求助10
19秒前
19秒前
znn发布了新的文献求助10
20秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Conference Record, IAS Annual Meeting 1977 1050
Structural Load Modelling and Combination for Performance and Safety Evaluation 1000
Les Mantodea de Guyane Insecta, Polyneoptera 1000
2024-2030年中国聚异戊二烯橡胶行业市场现状调查及发展前景研判报告 500
Barth, Derrida and the Language of Theology 400
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3592029
求助须知:如何正确求助?哪些是违规求助? 3160216
关于积分的说明 9528511
捐赠科研通 2863485
什么是DOI,文献DOI怎么找? 1573530
邀请新用户注册赠送积分活动 738706
科研通“疑难数据库(出版商)”最低求助积分说明 723189