The complete telomere-to-telomere genome assembly of lettuce

端粒 端粒结合蛋白 基因组 生物 遗传学 DNA 计算生物学 基因 DNA结合蛋白 转录因子
作者
Ke Wang,Jingyun Jin,Jingxuan Wang,Xinrui Wang,Jie Sun,Dian Meng,Xiangfeng Wang,Yong Wang,Li Guo
出处
期刊:Plant communications [Elsevier BV]
卷期号:5 (10): 101011-101011 被引量:3
标识
DOI:10.1016/j.xplc.2024.101011
摘要

Lettuce (Lactuca sativa L.) is an annual plant of the Asteraceae family, commonly used as a fresh-cut vegetable and a primary ingredient in salads. It is rich in vitamins, minerals, polyphenols, and carotenoids, providing numerous health benefits. In 2021, lettuce achieved a gross production value of $16.6 billion worldwide, with China, the United States, and Western Europe as leading lettuce producers (Food and Agriculture, 2023Food and AgricultureOrganization of the United Nations. FAOSTAT, Rome2023Google Scholar). Most cultivated lettuce varieties are inbred (2n = 18) and exhibit genetic diversity, rendering them susceptible to various abiotic and biotic stresses (Richard, 2004Richard N.,R. S.A.M.H., N.Diseases of Fruits and Vegetables Diagnosis and Management. KLUWER ACADEMIC PUBLISHERS, 2004Google Scholar; Galieni et al., 2015Galieni A. Di Mattia C. De Gregorio M. Speca S. Mastrocola D. Pisante M. Stagnari F. Effects of nutrient deficiency and abiotic environmental stresses on yield, phenolic compounds and antiradical activity in lettuce (Lactuca sativa L.).Sci. Hortic. 2015; 187: 93-101Crossref Scopus (0) Google Scholar). Hence, lettuce breeding efforts primarily focus on improving yield, quality, and disease resistance, heavily dependent on genetic and genomic resources such as molecular markers, reference genomes, and multi-omics data. The first lettuce genome was assembled using next-generation sequencing (NGS) reads in 2017 (Reyes-Chin-Wo et al., 2017Reyes-Chin-Wo S. Wang Z. Yang X. Kozik A. Arikit S. Song C. Xia L. Froenicke L. Lavelle D.O. Truco M.J. et al.Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce.Nat. Commun. 2017; 814953Crossref PubMed Scopus (294) Google Scholar). In 2022, the improved lettuce reference genome v11 of crisphead lettuce cultivar Salinas (GCA_002870075.4) was released; subsequently, Shen et al. (Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar) assembled the genome of stem lettuce (L. sativa var. Augustana). Although these assemblies have greatly facilitated lettuce research (Wei et al., 2021Wei T. van Treuren R. Liu X. Zhang Z. Chen J. Liu Y. Dong S. Sun P. Yang T. Lan T. et al.Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce.Nat. Genet. 2021; 53: 752-760Crossref PubMed Scopus (66) Google Scholar; Gao et al., 2022Gao F. Li J. Zhang J. Li N. Tang C. Bakpa E.P. Xie J. Genome-wide identification of the ZIP gene family in lettuce (Lactuca sativa L.) and expression analysis under different element stress.PLoS One. 2022; 17e0274319Crossref Scopus (6) Google Scholar; Pink et al., 2022Pink H. Talbot A. Graceson A. Graham J. Higgins G. Taylor A. Jackson A.C. Truco M. Michelmore R. Yao C. et al.Identification of genetic loci in lettuce mediating quantitative resistance to fungal pathogens.Theor. Appl. Genet. 2022; 135: 2481-2500Crossref PubMed Scopus (4) Google Scholar; Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar), they remain highly fragmented and incomplete—containing hundreds of gaps and omitting key genetic elements such as centromeres, rDNA, and telomeres—continues to hinder progress in genomic research, gene cloning, and molecular breeding. Here, we report the first complete telomere-to-telomere (T2T) genome of the L. sativa cv. PKU06 (Figure 1A), which is widely cultivated and consumed. This assembly included 112.4× coverage of PacBio high-fidelity (HiFi) long reads, 42.9× coverage of Oxford Nanopore Technology (ONT) ultra-long reads (N50 > 100 kb), and 118.8× coverage of Hi-C reads (Supplemental Table 1). Genome assembly was performed using an in-house pipeline (Supplemental Figure 1) as follows. First, the HiFi and ONT reads were assembled using hifiasm, resulting in a draft genome of 125 contigs. After removing microbial and plastid sequences, these contigs were anchored to nine chromosomes using Hi-C data (Supplemental Figure 2). Errors in placement or mis-orientation of the contigs were manually corrected in Juicebox. This yielded a chromosome-scale assembly with only two remaining gaps on Chr4, which were subsequently filled with the ONT reads to achieve a gap-free assembly (Supplemental Figure 3). The two nucleolus organizer regions (NORs) on Chr1 and Chr8 were successfully resolved, containing a total of 8.63-Mb rDNA repeat arrays with 884 copies (Figure 1B). The final complete T2T genome (LsT2T) (Figure 1A) is 2593 Mb in size with a contig N50 of 320.7 Mb, marking a 2565.6% increase in N50 compared to the 12.5 Mb of Salinas (Supplemental Table 2). In addition, we identified all 18 telomeres using the seven-base telomere repeats (CCCTAAA and TTTAGGG) (Supplemental Table 3). LsT2T showed high synteny (96.96%) to the Salinas genome, though it displayed structural variants likely due to differences between the two cultivars (Supplemental Figure 4). Notably, LsT2T closed 384 gaps present in the Salinas genome, substantially improving the contiguity of the lettuce genome (Supplemental Table 2). Extensive validation confirmed the accuracy of LsT2T. First, the Hi-C interaction map of LsT2T showed no obvious structural assembly errors (Supplemental Figure 2). Secondly, the alignment of all raw sequencing data to LsT2T yielded mapping rates of 99.9%, 96.4%, and 99.9% for HiFi, ONT, and NGS reads, respectively (Supplemental Table 1). Uniform genome-wide read coverage (Figure 1A) indicated a complete and highly accurate assembly. Interestingly, we observed sporadic instances of elevated coverage in ONT reads (Figure 1A; Supplemental Table 4) corresponding to chloroplast sequences, suggesting the integration of plastid genome within the nuclear genome. Furthermore, LsT2T has a quality value of 58 and a BUSCO score of 97.6% (Supplemental Table 2), demonstrating its high accuracy and completeness. Approximately 2.1 Gb of repetitive elements (REs) constituting 81.4% of the LsT2T genome were annotated, predominantly comprised of transposable elements (TEs) (Figure 1C; Supplemental Table 5). Notably, the majority of these TEs were LTR retrotransposons, with Gypsy and Copia elements representing 37.84% and 27.23% of the LsT2T genome, respectively. A total of 45507 protein-coding genes (Supplemental Table 6) were predicted in LsT2T using ab initio prediction, comparison with homologous proteins, and transcriptomic data from five different tissues sequenced using NGS and PacBio Iso-seq. Of these genes, 48.8% were functionally annotated using eggNOG-mapper, and 57.3% were expressed in at least one tissue, with a threshold of TPM ≥ 1 (Supplemental Table 6). Analysis of newly assembled sequences in LsT2T compared to the Salinas genome revealed that these sequences consisted of 2.09% genes, 31.34% REs, 16.9% centromeres, and 43.4% rDNA arrays (Supplemental Figure 5B), highlighting the significance of a complete genome in uncovering essential genomic regions. In addition, comparative analysis of the protein-coding genes in the LsT2T, Salinas, and Augustana genomes through orthogroup identification revealed a high degree of similarity across the three genome annotations, despite the differences in cultivar types, assembly quality, and annotation pipelines. LsT2T and Salinas (leaf lettuce) were more similar to each other than to Augustana (stem lettuce) in terms of the number of shared orthogroups (Supplemental Figure 5C). Centromeres, which are repeat-rich heterochromatic regions, are critical for accurate chromosome segregation during cell division (Cleveland et al., 2003Cleveland D.W. Mao Y. Sullivan K.F. Review Centromeres and Kinetochores: From Epigenetics to Mitotic Checkpoint Signaling Elements of the Mitotic Checkpoint, They Control Cell Cycle Advance during Cell Division. Defining the Locus the Centromere Challenges the Classic View of a Genetic.2003Google Scholar). The centromeres of lettuce were identified through ChIP-seq profiling using a lettuce-specific CENH3 (centromere-specific histone 3) antibody, which clearly delineated the boundaries of nine centromeres (Figure 1D; Supplemental Table 7), ranging in size from 2.7 Mb (Chr6) to 4.5 Mb (Chr7). The position of centromeres varied across chromosomes, with the ratio of long arm vs. short arm ranging from 1.1 (Chr6) to 3.2 (Chr8) (Figure 1A; Supplemental Figure 4A). Low sequence similarity among the centromeres was observed (Supplemental Figure 6), suggesting strong diversification. Centromeric repeats predominantly consisted of Gypsy (56.6%), Copia (13.1%), and satellites (16.3%), differing from those in non-centromeric regions (Figure 1C). In addition, centromeric Gypsy elements were dominated by Tekay, Angela, and centromeric retrotransposons of maize (CRMs) (Supplemental Figure 7A). Notably, CRMs appeared more frequently in centromeric than non-centromeric regions, consistent with previous reports for maize and cotton (Chen et al., 2023Chen J. Wang Z. Tan K. Huang W. Shi J. Li T. Hu J. Wang K. Wang C. Xin B. et al.A complete telomere-to-telomere assembly of the maize genome.Nat. Genet. 2023; 55: 1221-1231Crossref PubMed Scopus (52) Google Scholar; Chang et al., 2024Chang X. He X. Li J. Liu Z. Pi R. Luo X. Wang R. Hu X. Lu S. Zhang X. et al.High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres.Plant Commun. 2024; 5100722https://doi.org/10.1016/j.xplc.2023.100722Abstract Full Text Full Text PDF Scopus (5) Google Scholar). Phylogenetic analysis of Gypsy revealed that centromeric CRMs formed a unique clade, suggesting the expansion of centromeric CRMs distinct from non-centromeric CRMs (Supplemental Figure 7B). The proportions of satellites in the centromeres varied from 3.25% (Chr3) to 60.14% (Chr1) (Figure 1D; Supplemental Table 7). De novo identification of centromeric satellite monomers using TRASH revealed 30-bp, 62-bp, 287-bp, and 123-bp monomers as predominated satellites (Supplemental Figure 7C). We also observed higher-order repeats (Figure 1E; Supplemental Figure 8), primarily composed of 62-bp monomers along with miscellaneous short repeats (Supplemental Figure 7C). Analysis of CENH3 enrichment demonstrated that CENH3 preferentially binds to Gypsy elements and satellite sequences (Figure 1E; Supplemental Figures 8 and 9), highlighting their importance in centromere function. Despite the decoded lettuce genome, its 3D genomic landscape remains largely unexplored. We utilized miniMDS to model the 3D structure of the lettuce genome using high-resolution Hi-C data (Supplemental Figure 10). The 2.59-Gb lettuce genome is organized into topologically associated domains (TADs) and A/B compartments, exhibiting a low frequency of A/B compartment switching. Notably, all centromeres were localized in the B compartment (Figure 1E; Supplemental Figure 11). The A compartment demonstrated a higher gene density and lower TE density than the B compartment, and both compartments displayed distinctive epigenetic markers (Figure 1E; Supplemental Figure 11). ChIP-seq analysis of histone modifications revealed that H3K4me3 and H3K27me3, which mark transcription activation and repression, respectively, were enriched in A compartments, whereas B compartments showed enrichment for H3K9me2, typically associated with heterochromatin (Figure 1E; Supplemental Figure 11). This conserved pattern is consistent with those observed in most plant 3D genomes reported thus far. Given the susceptibility of cultivated lettuce to diseases, developing disease-resistant cultivars is crucial for environment-friendly disease management. Nucleotide-binding site leucine-rich repeat (NLR) proteins are crucial for plant immunity against pathogens (Chou et al., 2023Chou W.C. Jha S. Linhoff M.W. Ting J.P.Y. The NLR gene family: from discovery to present day.Nat. Rev. Immunol. 2023; 23: 635-654Crossref PubMed Scopus (9) Google Scholar). Our systematic analysis identified 514 putative NLR genes in the LsT2T genome, which were classified into seven subfamilies based on a phylogenetic analysis of the NB-ARC domain (Figure 1F). This classification indicates high phylogenetic diversity. By contrast, the same approach identified only 484 NLR genes in the v11 genome. The majority of NLR genes in the LsT2T genome were tandemly duplicated and genomically clustered, particularly on Chr1 and Chr2 (Figure 1G). Interestingly, four new NLRs were identified in the filled gap regions of LsT2T (Figure 1H; Supplemental Figure 12), including one specifically located within a gap region of Chr4 that was exclusively covered by ONT reads mapped to LsT2T . Transcriptomic analysis of the 514 NLR genes (Supplemental Table 8) revealed that 58 of these genes were significantly upregulated during gray mold (Botrytis cinerea) infection compared to mock treatments, and 38 of these genes encoding TIR-NB-ARC(-LRR) domains were predominantly upregulated (Figure 1F; Supplemental Table 9). The most significantly upregulated NLR gene, lettuce_v2_00029769, is homologous to the Arabidopsis thaliana AT5G36930 gene, which encodes a TIR-NB-ARC-LRR type NLR. The future functional characterization of these infection-induced NLR genes, as revealed by the T2T genome, will provide deeper insights into the mechanisms of lettuce immunity against pathogens. In summary, we generated the complete T2T genome of lettuce, the first for Asterids, and thoroughly dissected the complex genetic and epigenetic landscape of its centromeres. This genome will serve as an essential resource for advancing lettuce research and facilitating genetic improvements. All raw sequencing data generated for this project have been deposited in the China National Center for Bioinformation under accession number CRA014517, accessible at the link: https://ngdc.cncb.ac.cn/gsa/s/Pya57yDW. The genome assembly and annotation are available on Figshare at the following link: https://figshare.com/s/f5f0e8068d5a236ea408. This project was supported by the Key R&D Program of Shandong Province (ZR202211070163) and the Natural Science Foundation for Distinguished Young Scholars of Shandong Province (ZR2023JQ010). L.G. is also supported by the Taishan Scholars Program of Shandong Province.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
科研通AI5应助cc采纳,获得10
1秒前
铜泰妍完成签到 ,获得积分10
2秒前
贝贝完成签到 ,获得积分10
7秒前
Lrcx完成签到 ,获得积分10
8秒前
Wen完成签到 ,获得积分10
9秒前
盘尼西林完成签到 ,获得积分10
11秒前
LOVE0077完成签到,获得积分10
14秒前
zhao完成签到,获得积分10
16秒前
BINBIN完成签到 ,获得积分10
26秒前
ambrose37完成签到 ,获得积分10
28秒前
量子星尘发布了新的文献求助10
34秒前
fufufu123完成签到 ,获得积分10
38秒前
开心的大娘完成签到,获得积分10
38秒前
www完成签到 ,获得积分10
40秒前
末末完成签到 ,获得积分10
50秒前
无为完成签到 ,获得积分10
51秒前
白嫖论文完成签到 ,获得积分10
53秒前
上官若男应助忧伤的步美采纳,获得10
56秒前
科研通AI2S应助科研通管家采纳,获得10
56秒前
59秒前
从心随缘完成签到 ,获得积分10
1分钟前
花花发布了新的文献求助10
1分钟前
牛奶面包完成签到 ,获得积分10
1分钟前
1分钟前
岁月如歌完成签到 ,获得积分0
1分钟前
1分钟前
Li完成签到,获得积分10
1分钟前
张琨完成签到 ,获得积分10
1分钟前
1分钟前
sunnyqqz完成签到,获得积分10
1分钟前
热情的乘风完成签到,获得积分20
1分钟前
1分钟前
霍凡白完成签到,获得积分10
1分钟前
1分钟前
Feng发布了新的文献求助20
1分钟前
怕孤单的若颜完成签到 ,获得积分10
1分钟前
1分钟前
ruochenzu发布了新的文献求助10
1分钟前
zhongu发布了新的文献求助10
1分钟前
阳光彩虹小白马完成签到 ,获得积分10
1分钟前
高分求助中
【提示信息,请勿应助】关于scihub 10000
Les Mantodea de Guyane: Insecta, Polyneoptera [The Mantids of French Guiana] 3000
徐淮辽南地区新元古代叠层石及生物地层 3000
The Mother of All Tableaux: Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 3000
Handbook of Industrial Diamonds.Vol2 1100
Global Eyelash Assessment scale (GEA) 1000
Picture Books with Same-sex Parented Families: Unintentional Censorship 550
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4038039
求助须知:如何正确求助?哪些是违规求助? 3575756
关于积分的说明 11373782
捐赠科研通 3305574
什么是DOI,文献DOI怎么找? 1819239
邀请新用户注册赠送积分活动 892655
科研通“疑难数据库(出版商)”最低求助积分说明 815022