The complete telomere-to-telomere genome assembly of lettuce

端粒 端粒结合蛋白 基因组 生物 遗传学 DNA 计算生物学 基因 DNA结合蛋白 转录因子
作者
Ke Wang,Jingyun Jin,Jingxuan Wang,Sheng Wang,Jie Sun,Dian Meng,Xiangfeng Wang,Li Wang,Li Guo
出处
期刊:Plant communications [Elsevier]
卷期号:: 101011-101011
标识
DOI:10.1016/j.xplc.2024.101011
摘要

Lettuce (Lactuca sativa L.) is an annual plant of the Asteraceae family, commonly used as a fresh-cut vegetable and a primary ingredient in salads. It is rich in vitamins, minerals, polyphenols, and carotenoids, providing numerous health benefits. In 2021, lettuce achieved a gross production value of $16.6 billion worldwide, with China, the United States, and Western Europe as leading lettuce producers (Food and Agriculture, 2023Food and AgricultureOrganization of the United Nations. FAOSTAT, Rome2023Google Scholar). Most cultivated lettuce varieties are inbred (2n = 18) and exhibit genetic diversity, rendering them susceptible to various abiotic and biotic stresses (Richard, 2004Richard N.,R. S.A.M.H., N.Diseases of Fruits and Vegetables Diagnosis and Management. KLUWER ACADEMIC PUBLISHERS, 2004Google Scholar; Galieni et al., 2015Galieni A. Di Mattia C. De Gregorio M. Speca S. Mastrocola D. Pisante M. Stagnari F. Effects of nutrient deficiency and abiotic environmental stresses on yield, phenolic compounds and antiradical activity in lettuce (Lactuca sativa L.).Sci. Hortic. 2015; 187: 93-101Crossref Scopus (0) Google Scholar). Hence, lettuce breeding efforts primarily focus on improving yield, quality, and disease resistance, heavily dependent on genetic and genomic resources such as molecular markers, reference genomes, and multi-omics data. The first lettuce genome was assembled using next-generation sequencing (NGS) reads in 2017 (Reyes-Chin-Wo et al., 2017Reyes-Chin-Wo S. Wang Z. Yang X. Kozik A. Arikit S. Song C. Xia L. Froenicke L. Lavelle D.O. Truco M.J. et al.Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce.Nat. Commun. 2017; 814953Crossref PubMed Scopus (294) Google Scholar). In 2022, the improved lettuce reference genome v11 of crisphead lettuce cultivar Salinas (GCA_002870075.4) was released; subsequently, Shen et al. (Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar) assembled the genome of stem lettuce (L. sativa var. Augustana). Although these assemblies have greatly facilitated lettuce research (Wei et al., 2021Wei T. van Treuren R. Liu X. Zhang Z. Chen J. Liu Y. Dong S. Sun P. Yang T. Lan T. et al.Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce.Nat. Genet. 2021; 53: 752-760Crossref PubMed Scopus (66) Google Scholar; Gao et al., 2022Gao F. Li J. Zhang J. Li N. Tang C. Bakpa E.P. Xie J. Genome-wide identification of the ZIP gene family in lettuce (Lactuca sativa L.) and expression analysis under different element stress.PLoS One. 2022; 17e0274319Crossref Scopus (6) Google Scholar; Pink et al., 2022Pink H. Talbot A. Graceson A. Graham J. Higgins G. Taylor A. Jackson A.C. Truco M. Michelmore R. Yao C. et al.Identification of genetic loci in lettuce mediating quantitative resistance to fungal pathogens.Theor. Appl. Genet. 2022; 135: 2481-2500Crossref PubMed Scopus (4) Google Scholar; Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar), they remain highly fragmented and incomplete—containing hundreds of gaps and omitting key genetic elements such as centromeres, rDNA, and telomeres—continues to hinder progress in genomic research, gene cloning, and molecular breeding. Here, we report the first complete telomere-to-telomere (T2T) genome of the L. sativa cv. PKU06 (Figure 1A), which is widely cultivated and consumed. This assembly included 112.4× coverage of PacBio high-fidelity (HiFi) long reads, 42.9× coverage of Oxford Nanopore Technology (ONT) ultra-long reads (N50 > 100 kb), and 118.8× coverage of Hi-C reads (Supplemental Table 1). Genome assembly was performed using an in-house pipeline (Supplemental Figure 1) as follows. First, the HiFi and ONT reads were assembled using hifiasm, resulting in a draft genome of 125 contigs. After removing microbial and plastid sequences, these contigs were anchored to nine chromosomes using Hi-C data (Supplemental Figure 2). Errors in placement or mis-orientation of the contigs were manually corrected in Juicebox. This yielded a chromosome-scale assembly with only two remaining gaps on Chr4, which were subsequently filled with the ONT reads to achieve a gap-free assembly (Supplemental Figure 3). The two nucleolus organizer regions (NORs) on Chr1 and Chr8 were successfully resolved, containing a total of 8.63-Mb rDNA repeat arrays with 884 copies (Figure 1B). The final complete T2T genome (LsT2T) (Figure 1A) is 2593 Mb in size with a contig N50 of 320.7 Mb, marking a 2565.6% increase in N50 compared to the 12.5 Mb of Salinas (Supplemental Table 2). In addition, we identified all 18 telomeres using the seven-base telomere repeats (CCCTAAA and TTTAGGG) (Supplemental Table 3). LsT2T showed high synteny (96.96%) to the Salinas genome, though it displayed structural variants likely due to differences between the two cultivars (Supplemental Figure 4). Notably, LsT2T closed 384 gaps present in the Salinas genome, substantially improving the contiguity of the lettuce genome (Supplemental Table 2). Extensive validation confirmed the accuracy of LsT2T. First, the Hi-C interaction map of LsT2T showed no obvious structural assembly errors (Supplemental Figure 2). Secondly, the alignment of all raw sequencing data to LsT2T yielded mapping rates of 99.9%, 96.4%, and 99.9% for HiFi, ONT, and NGS reads, respectively (Supplemental Table 1). Uniform genome-wide read coverage (Figure 1A) indicated a complete and highly accurate assembly. Interestingly, we observed sporadic instances of elevated coverage in ONT reads (Figure 1A; Supplemental Table 4) corresponding to chloroplast sequences, suggesting the integration of plastid genome within the nuclear genome. Furthermore, LsT2T has a quality value of 58 and a BUSCO score of 97.6% (Supplemental Table 2), demonstrating its high accuracy and completeness. Approximately 2.1 Gb of repetitive elements (REs) constituting 81.4% of the LsT2T genome were annotated, predominantly comprised of transposable elements (TEs) (Figure 1C; Supplemental Table 5). Notably, the majority of these TEs were LTR retrotransposons, with Gypsy and Copia elements representing 37.84% and 27.23% of the LsT2T genome, respectively. A total of 45507 protein-coding genes (Supplemental Table 6) were predicted in LsT2T using ab initio prediction, comparison with homologous proteins, and transcriptomic data from five different tissues sequenced using NGS and PacBio Iso-seq. Of these genes, 48.8% were functionally annotated using eggNOG-mapper, and 57.3% were expressed in at least one tissue, with a threshold of TPM ≥ 1 (Supplemental Table 6). Analysis of newly assembled sequences in LsT2T compared to the Salinas genome revealed that these sequences consisted of 2.09% genes, 31.34% REs, 16.9% centromeres, and 43.4% rDNA arrays (Supplemental Figure 5B), highlighting the significance of a complete genome in uncovering essential genomic regions. In addition, comparative analysis of the protein-coding genes in the LsT2T, Salinas, and Augustana genomes through orthogroup identification revealed a high degree of similarity across the three genome annotations, despite the differences in cultivar types, assembly quality, and annotation pipelines. LsT2T and Salinas (leaf lettuce) were more similar to each other than to Augustana (stem lettuce) in terms of the number of shared orthogroups (Supplemental Figure 5C). Centromeres, which are repeat-rich heterochromatic regions, are critical for accurate chromosome segregation during cell division (Cleveland et al., 2003Cleveland D.W. Mao Y. Sullivan K.F. Review Centromeres and Kinetochores: From Epigenetics to Mitotic Checkpoint Signaling Elements of the Mitotic Checkpoint, They Control Cell Cycle Advance during Cell Division. Defining the Locus the Centromere Challenges the Classic View of a Genetic.2003Google Scholar). The centromeres of lettuce were identified through ChIP-seq profiling using a lettuce-specific CENH3 (centromere-specific histone 3) antibody, which clearly delineated the boundaries of nine centromeres (Figure 1D; Supplemental Table 7), ranging in size from 2.7 Mb (Chr6) to 4.5 Mb (Chr7). The position of centromeres varied across chromosomes, with the ratio of long arm vs. short arm ranging from 1.1 (Chr6) to 3.2 (Chr8) (Figure 1A; Supplemental Figure 4A). Low sequence similarity among the centromeres was observed (Supplemental Figure 6), suggesting strong diversification. Centromeric repeats predominantly consisted of Gypsy (56.6%), Copia (13.1%), and satellites (16.3%), differing from those in non-centromeric regions (Figure 1C). In addition, centromeric Gypsy elements were dominated by Tekay, Angela, and centromeric retrotransposons of maize (CRMs) (Supplemental Figure 7A). Notably, CRMs appeared more frequently in centromeric than non-centromeric regions, consistent with previous reports for maize and cotton (Chen et al., 2023Chen J. Wang Z. Tan K. Huang W. Shi J. Li T. Hu J. Wang K. Wang C. Xin B. et al.A complete telomere-to-telomere assembly of the maize genome.Nat. Genet. 2023; 55: 1221-1231Crossref PubMed Scopus (52) Google Scholar; Chang et al., 2024Chang X. He X. Li J. Liu Z. Pi R. Luo X. Wang R. Hu X. Lu S. Zhang X. et al.High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres.Plant Commun. 2024; 5100722https://doi.org/10.1016/j.xplc.2023.100722Abstract Full Text Full Text PDF Scopus (5) Google Scholar). Phylogenetic analysis of Gypsy revealed that centromeric CRMs formed a unique clade, suggesting the expansion of centromeric CRMs distinct from non-centromeric CRMs (Supplemental Figure 7B). The proportions of satellites in the centromeres varied from 3.25% (Chr3) to 60.14% (Chr1) (Figure 1D; Supplemental Table 7). De novo identification of centromeric satellite monomers using TRASH revealed 30-bp, 62-bp, 287-bp, and 123-bp monomers as predominated satellites (Supplemental Figure 7C). We also observed higher-order repeats (Figure 1E; Supplemental Figure 8), primarily composed of 62-bp monomers along with miscellaneous short repeats (Supplemental Figure 7C). Analysis of CENH3 enrichment demonstrated that CENH3 preferentially binds to Gypsy elements and satellite sequences (Figure 1E; Supplemental Figures 8 and 9), highlighting their importance in centromere function. Despite the decoded lettuce genome, its 3D genomic landscape remains largely unexplored. We utilized miniMDS to model the 3D structure of the lettuce genome using high-resolution Hi-C data (Supplemental Figure 10). The 2.59-Gb lettuce genome is organized into topologically associated domains (TADs) and A/B compartments, exhibiting a low frequency of A/B compartment switching. Notably, all centromeres were localized in the B compartment (Figure 1E; Supplemental Figure 11). The A compartment demonstrated a higher gene density and lower TE density than the B compartment, and both compartments displayed distinctive epigenetic markers (Figure 1E; Supplemental Figure 11). ChIP-seq analysis of histone modifications revealed that H3K4me3 and H3K27me3, which mark transcription activation and repression, respectively, were enriched in A compartments, whereas B compartments showed enrichment for H3K9me2, typically associated with heterochromatin (Figure 1E; Supplemental Figure 11). This conserved pattern is consistent with those observed in most plant 3D genomes reported thus far. Given the susceptibility of cultivated lettuce to diseases, developing disease-resistant cultivars is crucial for environment-friendly disease management. Nucleotide-binding site leucine-rich repeat (NLR) proteins are crucial for plant immunity against pathogens (Chou et al., 2023Chou W.C. Jha S. Linhoff M.W. Ting J.P.Y. The NLR gene family: from discovery to present day.Nat. Rev. Immunol. 2023; 23: 635-654Crossref PubMed Scopus (9) Google Scholar). Our systematic analysis identified 514 putative NLR genes in the LsT2T genome, which were classified into seven subfamilies based on a phylogenetic analysis of the NB-ARC domain (Figure 1F). This classification indicates high phylogenetic diversity. By contrast, the same approach identified only 484 NLR genes in the v11 genome. The majority of NLR genes in the LsT2T genome were tandemly duplicated and genomically clustered, particularly on Chr1 and Chr2 (Figure 1G). Interestingly, four new NLRs were identified in the filled gap regions of LsT2T (Figure 1H; Supplemental Figure 12), including one specifically located within a gap region of Chr4 that was exclusively covered by ONT reads mapped to LsT2T . Transcriptomic analysis of the 514 NLR genes (Supplemental Table 8) revealed that 58 of these genes were significantly upregulated during gray mold (Botrytis cinerea) infection compared to mock treatments, and 38 of these genes encoding TIR-NB-ARC(-LRR) domains were predominantly upregulated (Figure 1F; Supplemental Table 9). The most significantly upregulated NLR gene, lettuce_v2_00029769, is homologous to the Arabidopsis thaliana AT5G36930 gene, which encodes a TIR-NB-ARC-LRR type NLR. The future functional characterization of these infection-induced NLR genes, as revealed by the T2T genome, will provide deeper insights into the mechanisms of lettuce immunity against pathogens. In summary, we generated the complete T2T genome of lettuce, the first for Asterids, and thoroughly dissected the complex genetic and epigenetic landscape of its centromeres. This genome will serve as an essential resource for advancing lettuce research and facilitating genetic improvements. All raw sequencing data generated for this project have been deposited in the China National Center for Bioinformation under accession number CRA014517, accessible at the link: https://ngdc.cncb.ac.cn/gsa/s/Pya57yDW. The genome assembly and annotation are available on Figshare at the following link: https://figshare.com/s/f5f0e8068d5a236ea408. This project was supported by the Key R&D Program of Shandong Province (ZR202211070163) and the Natural Science Foundation for Distinguished Young Scholars of Shandong Province (ZR2023JQ010). L.G. is also supported by the Taishan Scholars Program of Shandong Province.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
小马甲应助云_123采纳,获得10
1秒前
1秒前
Zhang完成签到,获得积分10
2秒前
一二发布了新的文献求助10
3秒前
4秒前
搜集达人应助傻傻的磬采纳,获得10
5秒前
时尚大树完成签到,获得积分10
5秒前
顾念完成签到 ,获得积分10
5秒前
7秒前
木雷完成签到,获得积分20
7秒前
鄂成危完成签到,获得积分10
8秒前
9秒前
xuhongbo发布了新的文献求助10
10秒前
For完成签到,获得积分20
11秒前
Ava应助传统的如霜采纳,获得10
12秒前
云_123发布了新的文献求助10
13秒前
hanhan299发布了新的文献求助10
13秒前
Cassie应助xyz采纳,获得10
13秒前
如初完成签到,获得积分20
16秒前
For发布了新的文献求助10
16秒前
wanci应助JXY采纳,获得10
16秒前
呼呼啦啦完成签到,获得积分10
21秒前
科研通AI2S应助aaa采纳,获得10
22秒前
zhan完成签到,获得积分20
22秒前
天天快乐应助robi采纳,获得10
23秒前
xuhongbo完成签到,获得积分10
23秒前
27秒前
搜集达人应助wertgfqer采纳,获得10
28秒前
不配.应助myy采纳,获得10
30秒前
31秒前
GJ完成签到 ,获得积分10
31秒前
JXY发布了新的文献求助10
32秒前
炙热水风完成签到 ,获得积分20
33秒前
白泽完成签到,获得积分10
33秒前
CipherSage应助甜甜若血采纳,获得30
35秒前
小羊小羊给小羊小羊的求助进行了留言
36秒前
36秒前
36秒前
wertgfqer完成签到,获得积分20
37秒前
如初发布了新的文献求助20
38秒前
高分求助中
Sustainability in Tides Chemistry 2800
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
Rechtsphilosophie 1000
Bayesian Models of Cognition:Reverse Engineering the Mind 888
Le dégorgement réflexe des Acridiens 800
Defense against predation 800
Very-high-order BVD Schemes Using β-variable THINC Method 568
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3134943
求助须知:如何正确求助?哪些是违规求助? 2785830
关于积分的说明 7774354
捐赠科研通 2441699
什么是DOI,文献DOI怎么找? 1298104
科研通“疑难数据库(出版商)”最低求助积分说明 625079
版权声明 600825