摘要
Lettuce (Lactuca sativa L.) is an annual plant of the Asteraceae family, commonly used as a fresh-cut vegetable and a primary ingredient in salads. It is rich in vitamins, minerals, polyphenols, and carotenoids, providing numerous health benefits. In 2021, lettuce achieved a gross production value of $16.6 billion worldwide, with China, the United States, and Western Europe as leading lettuce producers (Food and Agriculture, 2023Food and AgricultureOrganization of the United Nations. FAOSTAT, Rome2023Google Scholar). Most cultivated lettuce varieties are inbred (2n = 18) and exhibit genetic diversity, rendering them susceptible to various abiotic and biotic stresses (Richard, 2004Richard N.,R. S.A.M.H., N.Diseases of Fruits and Vegetables Diagnosis and Management. KLUWER ACADEMIC PUBLISHERS, 2004Google Scholar; Galieni et al., 2015Galieni A. Di Mattia C. De Gregorio M. Speca S. Mastrocola D. Pisante M. Stagnari F. Effects of nutrient deficiency and abiotic environmental stresses on yield, phenolic compounds and antiradical activity in lettuce (Lactuca sativa L.).Sci. Hortic. 2015; 187: 93-101Crossref Scopus (0) Google Scholar). Hence, lettuce breeding efforts primarily focus on improving yield, quality, and disease resistance, heavily dependent on genetic and genomic resources such as molecular markers, reference genomes, and multi-omics data. The first lettuce genome was assembled using next-generation sequencing (NGS) reads in 2017 (Reyes-Chin-Wo et al., 2017Reyes-Chin-Wo S. Wang Z. Yang X. Kozik A. Arikit S. Song C. Xia L. Froenicke L. Lavelle D.O. Truco M.J. et al.Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce.Nat. Commun. 2017; 814953Crossref PubMed Scopus (294) Google Scholar). In 2022, the improved lettuce reference genome v11 of crisphead lettuce cultivar Salinas (GCA_002870075.4) was released; subsequently, Shen et al. (Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar) assembled the genome of stem lettuce (L. sativa var. Augustana). Although these assemblies have greatly facilitated lettuce research (Wei et al., 2021Wei T. van Treuren R. Liu X. Zhang Z. Chen J. Liu Y. Dong S. Sun P. Yang T. Lan T. et al.Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce.Nat. Genet. 2021; 53: 752-760Crossref PubMed Scopus (66) Google Scholar; Gao et al., 2022Gao F. Li J. Zhang J. Li N. Tang C. Bakpa E.P. Xie J. Genome-wide identification of the ZIP gene family in lettuce (Lactuca sativa L.) and expression analysis under different element stress.PLoS One. 2022; 17e0274319Crossref Scopus (6) Google Scholar; Pink et al., 2022Pink H. Talbot A. Graceson A. Graham J. Higgins G. Taylor A. Jackson A.C. Truco M. Michelmore R. Yao C. et al.Identification of genetic loci in lettuce mediating quantitative resistance to fungal pathogens.Theor. Appl. Genet. 2022; 135: 2481-2500Crossref PubMed Scopus (4) Google Scholar; Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar), they remain highly fragmented and incomplete—containing hundreds of gaps and omitting key genetic elements such as centromeres, rDNA, and telomeres—continues to hinder progress in genomic research, gene cloning, and molecular breeding. Here, we report the first complete telomere-to-telomere (T2T) genome of the L. sativa cv. PKU06 (Figure 1A), which is widely cultivated and consumed. This assembly included 112.4× coverage of PacBio high-fidelity (HiFi) long reads, 42.9× coverage of Oxford Nanopore Technology (ONT) ultra-long reads (N50 > 100 kb), and 118.8× coverage of Hi-C reads (Supplemental Table 1). Genome assembly was performed using an in-house pipeline (Supplemental Figure 1) as follows. First, the HiFi and ONT reads were assembled using hifiasm, resulting in a draft genome of 125 contigs. After removing microbial and plastid sequences, these contigs were anchored to nine chromosomes using Hi-C data (Supplemental Figure 2). Errors in placement or mis-orientation of the contigs were manually corrected in Juicebox. This yielded a chromosome-scale assembly with only two remaining gaps on Chr4, which were subsequently filled with the ONT reads to achieve a gap-free assembly (Supplemental Figure 3). The two nucleolus organizer regions (NORs) on Chr1 and Chr8 were successfully resolved, containing a total of 8.63-Mb rDNA repeat arrays with 884 copies (Figure 1B). The final complete T2T genome (LsT2T) (Figure 1A) is 2593 Mb in size with a contig N50 of 320.7 Mb, marking a 2565.6% increase in N50 compared to the 12.5 Mb of Salinas (Supplemental Table 2). In addition, we identified all 18 telomeres using the seven-base telomere repeats (CCCTAAA and TTTAGGG) (Supplemental Table 3). LsT2T showed high synteny (96.96%) to the Salinas genome, though it displayed structural variants likely due to differences between the two cultivars (Supplemental Figure 4). Notably, LsT2T closed 384 gaps present in the Salinas genome, substantially improving the contiguity of the lettuce genome (Supplemental Table 2). Extensive validation confirmed the accuracy of LsT2T. First, the Hi-C interaction map of LsT2T showed no obvious structural assembly errors (Supplemental Figure 2). Secondly, the alignment of all raw sequencing data to LsT2T yielded mapping rates of 99.9%, 96.4%, and 99.9% for HiFi, ONT, and NGS reads, respectively (Supplemental Table 1). Uniform genome-wide read coverage (Figure 1A) indicated a complete and highly accurate assembly. Interestingly, we observed sporadic instances of elevated coverage in ONT reads (Figure 1A; Supplemental Table 4) corresponding to chloroplast sequences, suggesting the integration of plastid genome within the nuclear genome. Furthermore, LsT2T has a quality value of 58 and a BUSCO score of 97.6% (Supplemental Table 2), demonstrating its high accuracy and completeness. Approximately 2.1 Gb of repetitive elements (REs) constituting 81.4% of the LsT2T genome were annotated, predominantly comprised of transposable elements (TEs) (Figure 1C; Supplemental Table 5). Notably, the majority of these TEs were LTR retrotransposons, with Gypsy and Copia elements representing 37.84% and 27.23% of the LsT2T genome, respectively. A total of 45507 protein-coding genes (Supplemental Table 6) were predicted in LsT2T using ab initio prediction, comparison with homologous proteins, and transcriptomic data from five different tissues sequenced using NGS and PacBio Iso-seq. Of these genes, 48.8% were functionally annotated using eggNOG-mapper, and 57.3% were expressed in at least one tissue, with a threshold of TPM ≥ 1 (Supplemental Table 6). Analysis of newly assembled sequences in LsT2T compared to the Salinas genome revealed that these sequences consisted of 2.09% genes, 31.34% REs, 16.9% centromeres, and 43.4% rDNA arrays (Supplemental Figure 5B), highlighting the significance of a complete genome in uncovering essential genomic regions. In addition, comparative analysis of the protein-coding genes in the LsT2T, Salinas, and Augustana genomes through orthogroup identification revealed a high degree of similarity across the three genome annotations, despite the differences in cultivar types, assembly quality, and annotation pipelines. LsT2T and Salinas (leaf lettuce) were more similar to each other than to Augustana (stem lettuce) in terms of the number of shared orthogroups (Supplemental Figure 5C). Centromeres, which are repeat-rich heterochromatic regions, are critical for accurate chromosome segregation during cell division (Cleveland et al., 2003Cleveland D.W. Mao Y. Sullivan K.F. Review Centromeres and Kinetochores: From Epigenetics to Mitotic Checkpoint Signaling Elements of the Mitotic Checkpoint, They Control Cell Cycle Advance during Cell Division. Defining the Locus the Centromere Challenges the Classic View of a Genetic.2003Google Scholar). The centromeres of lettuce were identified through ChIP-seq profiling using a lettuce-specific CENH3 (centromere-specific histone 3) antibody, which clearly delineated the boundaries of nine centromeres (Figure 1D; Supplemental Table 7), ranging in size from 2.7 Mb (Chr6) to 4.5 Mb (Chr7). The position of centromeres varied across chromosomes, with the ratio of long arm vs. short arm ranging from 1.1 (Chr6) to 3.2 (Chr8) (Figure 1A; Supplemental Figure 4A). Low sequence similarity among the centromeres was observed (Supplemental Figure 6), suggesting strong diversification. Centromeric repeats predominantly consisted of Gypsy (56.6%), Copia (13.1%), and satellites (16.3%), differing from those in non-centromeric regions (Figure 1C). In addition, centromeric Gypsy elements were dominated by Tekay, Angela, and centromeric retrotransposons of maize (CRMs) (Supplemental Figure 7A). Notably, CRMs appeared more frequently in centromeric than non-centromeric regions, consistent with previous reports for maize and cotton (Chen et al., 2023Chen J. Wang Z. Tan K. Huang W. Shi J. Li T. Hu J. Wang K. Wang C. Xin B. et al.A complete telomere-to-telomere assembly of the maize genome.Nat. Genet. 2023; 55: 1221-1231Crossref PubMed Scopus (52) Google Scholar; Chang et al., 2024Chang X. He X. Li J. Liu Z. Pi R. Luo X. Wang R. Hu X. Lu S. Zhang X. et al.High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres.Plant Commun. 2024; 5100722https://doi.org/10.1016/j.xplc.2023.100722Abstract Full Text Full Text PDF Scopus (5) Google Scholar). Phylogenetic analysis of Gypsy revealed that centromeric CRMs formed a unique clade, suggesting the expansion of centromeric CRMs distinct from non-centromeric CRMs (Supplemental Figure 7B). The proportions of satellites in the centromeres varied from 3.25% (Chr3) to 60.14% (Chr1) (Figure 1D; Supplemental Table 7). De novo identification of centromeric satellite monomers using TRASH revealed 30-bp, 62-bp, 287-bp, and 123-bp monomers as predominated satellites (Supplemental Figure 7C). We also observed higher-order repeats (Figure 1E; Supplemental Figure 8), primarily composed of 62-bp monomers along with miscellaneous short repeats (Supplemental Figure 7C). Analysis of CENH3 enrichment demonstrated that CENH3 preferentially binds to Gypsy elements and satellite sequences (Figure 1E; Supplemental Figures 8 and 9), highlighting their importance in centromere function. Despite the decoded lettuce genome, its 3D genomic landscape remains largely unexplored. We utilized miniMDS to model the 3D structure of the lettuce genome using high-resolution Hi-C data (Supplemental Figure 10). The 2.59-Gb lettuce genome is organized into topologically associated domains (TADs) and A/B compartments, exhibiting a low frequency of A/B compartment switching. Notably, all centromeres were localized in the B compartment (Figure 1E; Supplemental Figure 11). The A compartment demonstrated a higher gene density and lower TE density than the B compartment, and both compartments displayed distinctive epigenetic markers (Figure 1E; Supplemental Figure 11). ChIP-seq analysis of histone modifications revealed that H3K4me3 and H3K27me3, which mark transcription activation and repression, respectively, were enriched in A compartments, whereas B compartments showed enrichment for H3K9me2, typically associated with heterochromatin (Figure 1E; Supplemental Figure 11). This conserved pattern is consistent with those observed in most plant 3D genomes reported thus far. Given the susceptibility of cultivated lettuce to diseases, developing disease-resistant cultivars is crucial for environment-friendly disease management. Nucleotide-binding site leucine-rich repeat (NLR) proteins are crucial for plant immunity against pathogens (Chou et al., 2023Chou W.C. Jha S. Linhoff M.W. Ting J.P.Y. The NLR gene family: from discovery to present day.Nat. Rev. Immunol. 2023; 23: 635-654Crossref PubMed Scopus (9) Google Scholar). Our systematic analysis identified 514 putative NLR genes in the LsT2T genome, which were classified into seven subfamilies based on a phylogenetic analysis of the NB-ARC domain (Figure 1F). This classification indicates high phylogenetic diversity. By contrast, the same approach identified only 484 NLR genes in the v11 genome. The majority of NLR genes in the LsT2T genome were tandemly duplicated and genomically clustered, particularly on Chr1 and Chr2 (Figure 1G). Interestingly, four new NLRs were identified in the filled gap regions of LsT2T (Figure 1H; Supplemental Figure 12), including one specifically located within a gap region of Chr4 that was exclusively covered by ONT reads mapped to LsT2T . Transcriptomic analysis of the 514 NLR genes (Supplemental Table 8) revealed that 58 of these genes were significantly upregulated during gray mold (Botrytis cinerea) infection compared to mock treatments, and 38 of these genes encoding TIR-NB-ARC(-LRR) domains were predominantly upregulated (Figure 1F; Supplemental Table 9). The most significantly upregulated NLR gene, lettuce_v2_00029769, is homologous to the Arabidopsis thaliana AT5G36930 gene, which encodes a TIR-NB-ARC-LRR type NLR. The future functional characterization of these infection-induced NLR genes, as revealed by the T2T genome, will provide deeper insights into the mechanisms of lettuce immunity against pathogens. In summary, we generated the complete T2T genome of lettuce, the first for Asterids, and thoroughly dissected the complex genetic and epigenetic landscape of its centromeres. This genome will serve as an essential resource for advancing lettuce research and facilitating genetic improvements. All raw sequencing data generated for this project have been deposited in the China National Center for Bioinformation under accession number CRA014517, accessible at the link: https://ngdc.cncb.ac.cn/gsa/s/Pya57yDW. The genome assembly and annotation are available on Figshare at the following link: https://figshare.com/s/f5f0e8068d5a236ea408. This project was supported by the Key R&D Program of Shandong Province (ZR202211070163) and the Natural Science Foundation for Distinguished Young Scholars of Shandong Province (ZR2023JQ010). L.G. is also supported by the Taishan Scholars Program of Shandong Province.