亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

The complete telomere-to-telomere genome assembly of lettuce

端粒 端粒结合蛋白 基因组 生物 遗传学 DNA 计算生物学 基因 DNA结合蛋白 转录因子
作者
Ke Wang,Jingyun Jin,Jingxuan Wang,Sheng Wang,Jie Sun,Dian Meng,Xiangfeng Wang,Li Wang,Li Guo
出处
期刊:Plant communications [Elsevier]
卷期号:: 101011-101011
标识
DOI:10.1016/j.xplc.2024.101011
摘要

Lettuce (Lactuca sativa L.) is an annual plant of the Asteraceae family, commonly used as a fresh-cut vegetable and a primary ingredient in salads. It is rich in vitamins, minerals, polyphenols, and carotenoids, providing numerous health benefits. In 2021, lettuce achieved a gross production value of $16.6 billion worldwide, with China, the United States, and Western Europe as leading lettuce producers (Food and Agriculture, 2023Food and AgricultureOrganization of the United Nations. FAOSTAT, Rome2023Google Scholar). Most cultivated lettuce varieties are inbred (2n = 18) and exhibit genetic diversity, rendering them susceptible to various abiotic and biotic stresses (Richard, 2004Richard N.,R. S.A.M.H., N.Diseases of Fruits and Vegetables Diagnosis and Management. KLUWER ACADEMIC PUBLISHERS, 2004Google Scholar; Galieni et al., 2015Galieni A. Di Mattia C. De Gregorio M. Speca S. Mastrocola D. Pisante M. Stagnari F. Effects of nutrient deficiency and abiotic environmental stresses on yield, phenolic compounds and antiradical activity in lettuce (Lactuca sativa L.).Sci. Hortic. 2015; 187: 93-101Crossref Scopus (0) Google Scholar). Hence, lettuce breeding efforts primarily focus on improving yield, quality, and disease resistance, heavily dependent on genetic and genomic resources such as molecular markers, reference genomes, and multi-omics data. The first lettuce genome was assembled using next-generation sequencing (NGS) reads in 2017 (Reyes-Chin-Wo et al., 2017Reyes-Chin-Wo S. Wang Z. Yang X. Kozik A. Arikit S. Song C. Xia L. Froenicke L. Lavelle D.O. Truco M.J. et al.Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce.Nat. Commun. 2017; 814953Crossref PubMed Scopus (294) Google Scholar). In 2022, the improved lettuce reference genome v11 of crisphead lettuce cultivar Salinas (GCA_002870075.4) was released; subsequently, Shen et al. (Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar) assembled the genome of stem lettuce (L. sativa var. Augustana). Although these assemblies have greatly facilitated lettuce research (Wei et al., 2021Wei T. van Treuren R. Liu X. Zhang Z. Chen J. Liu Y. Dong S. Sun P. Yang T. Lan T. et al.Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce.Nat. Genet. 2021; 53: 752-760Crossref PubMed Scopus (66) Google Scholar; Gao et al., 2022Gao F. Li J. Zhang J. Li N. Tang C. Bakpa E.P. Xie J. Genome-wide identification of the ZIP gene family in lettuce (Lactuca sativa L.) and expression analysis under different element stress.PLoS One. 2022; 17e0274319Crossref Scopus (6) Google Scholar; Pink et al., 2022Pink H. Talbot A. Graceson A. Graham J. Higgins G. Taylor A. Jackson A.C. Truco M. Michelmore R. Yao C. et al.Identification of genetic loci in lettuce mediating quantitative resistance to fungal pathogens.Theor. Appl. Genet. 2022; 135: 2481-2500Crossref PubMed Scopus (4) Google Scholar; Shen et al., 2023Shen F. Qin Y. Wang R. Huang X. Wang Y. Gao T. He J. Zhou Y. Jiao Y. Wei J. et al.Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae.Nat. Commun. 2023; 144334Crossref Scopus (7) Google Scholar), they remain highly fragmented and incomplete—containing hundreds of gaps and omitting key genetic elements such as centromeres, rDNA, and telomeres—continues to hinder progress in genomic research, gene cloning, and molecular breeding. Here, we report the first complete telomere-to-telomere (T2T) genome of the L. sativa cv. PKU06 (Figure 1A), which is widely cultivated and consumed. This assembly included 112.4× coverage of PacBio high-fidelity (HiFi) long reads, 42.9× coverage of Oxford Nanopore Technology (ONT) ultra-long reads (N50 > 100 kb), and 118.8× coverage of Hi-C reads (Supplemental Table 1). Genome assembly was performed using an in-house pipeline (Supplemental Figure 1) as follows. First, the HiFi and ONT reads were assembled using hifiasm, resulting in a draft genome of 125 contigs. After removing microbial and plastid sequences, these contigs were anchored to nine chromosomes using Hi-C data (Supplemental Figure 2). Errors in placement or mis-orientation of the contigs were manually corrected in Juicebox. This yielded a chromosome-scale assembly with only two remaining gaps on Chr4, which were subsequently filled with the ONT reads to achieve a gap-free assembly (Supplemental Figure 3). The two nucleolus organizer regions (NORs) on Chr1 and Chr8 were successfully resolved, containing a total of 8.63-Mb rDNA repeat arrays with 884 copies (Figure 1B). The final complete T2T genome (LsT2T) (Figure 1A) is 2593 Mb in size with a contig N50 of 320.7 Mb, marking a 2565.6% increase in N50 compared to the 12.5 Mb of Salinas (Supplemental Table 2). In addition, we identified all 18 telomeres using the seven-base telomere repeats (CCCTAAA and TTTAGGG) (Supplemental Table 3). LsT2T showed high synteny (96.96%) to the Salinas genome, though it displayed structural variants likely due to differences between the two cultivars (Supplemental Figure 4). Notably, LsT2T closed 384 gaps present in the Salinas genome, substantially improving the contiguity of the lettuce genome (Supplemental Table 2). Extensive validation confirmed the accuracy of LsT2T. First, the Hi-C interaction map of LsT2T showed no obvious structural assembly errors (Supplemental Figure 2). Secondly, the alignment of all raw sequencing data to LsT2T yielded mapping rates of 99.9%, 96.4%, and 99.9% for HiFi, ONT, and NGS reads, respectively (Supplemental Table 1). Uniform genome-wide read coverage (Figure 1A) indicated a complete and highly accurate assembly. Interestingly, we observed sporadic instances of elevated coverage in ONT reads (Figure 1A; Supplemental Table 4) corresponding to chloroplast sequences, suggesting the integration of plastid genome within the nuclear genome. Furthermore, LsT2T has a quality value of 58 and a BUSCO score of 97.6% (Supplemental Table 2), demonstrating its high accuracy and completeness. Approximately 2.1 Gb of repetitive elements (REs) constituting 81.4% of the LsT2T genome were annotated, predominantly comprised of transposable elements (TEs) (Figure 1C; Supplemental Table 5). Notably, the majority of these TEs were LTR retrotransposons, with Gypsy and Copia elements representing 37.84% and 27.23% of the LsT2T genome, respectively. A total of 45507 protein-coding genes (Supplemental Table 6) were predicted in LsT2T using ab initio prediction, comparison with homologous proteins, and transcriptomic data from five different tissues sequenced using NGS and PacBio Iso-seq. Of these genes, 48.8% were functionally annotated using eggNOG-mapper, and 57.3% were expressed in at least one tissue, with a threshold of TPM ≥ 1 (Supplemental Table 6). Analysis of newly assembled sequences in LsT2T compared to the Salinas genome revealed that these sequences consisted of 2.09% genes, 31.34% REs, 16.9% centromeres, and 43.4% rDNA arrays (Supplemental Figure 5B), highlighting the significance of a complete genome in uncovering essential genomic regions. In addition, comparative analysis of the protein-coding genes in the LsT2T, Salinas, and Augustana genomes through orthogroup identification revealed a high degree of similarity across the three genome annotations, despite the differences in cultivar types, assembly quality, and annotation pipelines. LsT2T and Salinas (leaf lettuce) were more similar to each other than to Augustana (stem lettuce) in terms of the number of shared orthogroups (Supplemental Figure 5C). Centromeres, which are repeat-rich heterochromatic regions, are critical for accurate chromosome segregation during cell division (Cleveland et al., 2003Cleveland D.W. Mao Y. Sullivan K.F. Review Centromeres and Kinetochores: From Epigenetics to Mitotic Checkpoint Signaling Elements of the Mitotic Checkpoint, They Control Cell Cycle Advance during Cell Division. Defining the Locus the Centromere Challenges the Classic View of a Genetic.2003Google Scholar). The centromeres of lettuce were identified through ChIP-seq profiling using a lettuce-specific CENH3 (centromere-specific histone 3) antibody, which clearly delineated the boundaries of nine centromeres (Figure 1D; Supplemental Table 7), ranging in size from 2.7 Mb (Chr6) to 4.5 Mb (Chr7). The position of centromeres varied across chromosomes, with the ratio of long arm vs. short arm ranging from 1.1 (Chr6) to 3.2 (Chr8) (Figure 1A; Supplemental Figure 4A). Low sequence similarity among the centromeres was observed (Supplemental Figure 6), suggesting strong diversification. Centromeric repeats predominantly consisted of Gypsy (56.6%), Copia (13.1%), and satellites (16.3%), differing from those in non-centromeric regions (Figure 1C). In addition, centromeric Gypsy elements were dominated by Tekay, Angela, and centromeric retrotransposons of maize (CRMs) (Supplemental Figure 7A). Notably, CRMs appeared more frequently in centromeric than non-centromeric regions, consistent with previous reports for maize and cotton (Chen et al., 2023Chen J. Wang Z. Tan K. Huang W. Shi J. Li T. Hu J. Wang K. Wang C. Xin B. et al.A complete telomere-to-telomere assembly of the maize genome.Nat. Genet. 2023; 55: 1221-1231Crossref PubMed Scopus (52) Google Scholar; Chang et al., 2024Chang X. He X. Li J. Liu Z. Pi R. Luo X. Wang R. Hu X. Lu S. Zhang X. et al.High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres.Plant Commun. 2024; 5100722https://doi.org/10.1016/j.xplc.2023.100722Abstract Full Text Full Text PDF Scopus (5) Google Scholar). Phylogenetic analysis of Gypsy revealed that centromeric CRMs formed a unique clade, suggesting the expansion of centromeric CRMs distinct from non-centromeric CRMs (Supplemental Figure 7B). The proportions of satellites in the centromeres varied from 3.25% (Chr3) to 60.14% (Chr1) (Figure 1D; Supplemental Table 7). De novo identification of centromeric satellite monomers using TRASH revealed 30-bp, 62-bp, 287-bp, and 123-bp monomers as predominated satellites (Supplemental Figure 7C). We also observed higher-order repeats (Figure 1E; Supplemental Figure 8), primarily composed of 62-bp monomers along with miscellaneous short repeats (Supplemental Figure 7C). Analysis of CENH3 enrichment demonstrated that CENH3 preferentially binds to Gypsy elements and satellite sequences (Figure 1E; Supplemental Figures 8 and 9), highlighting their importance in centromere function. Despite the decoded lettuce genome, its 3D genomic landscape remains largely unexplored. We utilized miniMDS to model the 3D structure of the lettuce genome using high-resolution Hi-C data (Supplemental Figure 10). The 2.59-Gb lettuce genome is organized into topologically associated domains (TADs) and A/B compartments, exhibiting a low frequency of A/B compartment switching. Notably, all centromeres were localized in the B compartment (Figure 1E; Supplemental Figure 11). The A compartment demonstrated a higher gene density and lower TE density than the B compartment, and both compartments displayed distinctive epigenetic markers (Figure 1E; Supplemental Figure 11). ChIP-seq analysis of histone modifications revealed that H3K4me3 and H3K27me3, which mark transcription activation and repression, respectively, were enriched in A compartments, whereas B compartments showed enrichment for H3K9me2, typically associated with heterochromatin (Figure 1E; Supplemental Figure 11). This conserved pattern is consistent with those observed in most plant 3D genomes reported thus far. Given the susceptibility of cultivated lettuce to diseases, developing disease-resistant cultivars is crucial for environment-friendly disease management. Nucleotide-binding site leucine-rich repeat (NLR) proteins are crucial for plant immunity against pathogens (Chou et al., 2023Chou W.C. Jha S. Linhoff M.W. Ting J.P.Y. The NLR gene family: from discovery to present day.Nat. Rev. Immunol. 2023; 23: 635-654Crossref PubMed Scopus (9) Google Scholar). Our systematic analysis identified 514 putative NLR genes in the LsT2T genome, which were classified into seven subfamilies based on a phylogenetic analysis of the NB-ARC domain (Figure 1F). This classification indicates high phylogenetic diversity. By contrast, the same approach identified only 484 NLR genes in the v11 genome. The majority of NLR genes in the LsT2T genome were tandemly duplicated and genomically clustered, particularly on Chr1 and Chr2 (Figure 1G). Interestingly, four new NLRs were identified in the filled gap regions of LsT2T (Figure 1H; Supplemental Figure 12), including one specifically located within a gap region of Chr4 that was exclusively covered by ONT reads mapped to LsT2T . Transcriptomic analysis of the 514 NLR genes (Supplemental Table 8) revealed that 58 of these genes were significantly upregulated during gray mold (Botrytis cinerea) infection compared to mock treatments, and 38 of these genes encoding TIR-NB-ARC(-LRR) domains were predominantly upregulated (Figure 1F; Supplemental Table 9). The most significantly upregulated NLR gene, lettuce_v2_00029769, is homologous to the Arabidopsis thaliana AT5G36930 gene, which encodes a TIR-NB-ARC-LRR type NLR. The future functional characterization of these infection-induced NLR genes, as revealed by the T2T genome, will provide deeper insights into the mechanisms of lettuce immunity against pathogens. In summary, we generated the complete T2T genome of lettuce, the first for Asterids, and thoroughly dissected the complex genetic and epigenetic landscape of its centromeres. This genome will serve as an essential resource for advancing lettuce research and facilitating genetic improvements. All raw sequencing data generated for this project have been deposited in the China National Center for Bioinformation under accession number CRA014517, accessible at the link: https://ngdc.cncb.ac.cn/gsa/s/Pya57yDW. The genome assembly and annotation are available on Figshare at the following link: https://figshare.com/s/f5f0e8068d5a236ea408. This project was supported by the Key R&D Program of Shandong Province (ZR202211070163) and the Natural Science Foundation for Distinguished Young Scholars of Shandong Province (ZR2023JQ010). L.G. is also supported by the Taishan Scholars Program of Shandong Province.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
57秒前
慕青应助埃及下雨了采纳,获得10
1分钟前
鸿俦鹤侣完成签到,获得积分10
1分钟前
可爱的函函应助spark810采纳,获得10
1分钟前
CodeCraft应助科研通管家采纳,获得10
1分钟前
科研通AI2S应助科研通管家采纳,获得10
1分钟前
Bonnienuit完成签到 ,获得积分10
1分钟前
木木完成签到 ,获得积分20
2分钟前
pin完成签到 ,获得积分10
2分钟前
moufei完成签到,获得积分10
2分钟前
wbs13521完成签到,获得积分10
3分钟前
3分钟前
Mach发布了新的文献求助10
3分钟前
无花果应助科研通管家采纳,获得10
3分钟前
LZHWSND完成签到,获得积分10
3分钟前
6分钟前
6分钟前
Joker完成签到,获得积分10
6分钟前
脑洞疼应助123456采纳,获得10
7分钟前
乐乐应助科研通管家采纳,获得10
7分钟前
难过代柔完成签到 ,获得积分10
8分钟前
8分钟前
木木发布了新的文献求助10
8分钟前
8分钟前
9分钟前
9分钟前
爱静静完成签到,获得积分0
9分钟前
joe完成签到 ,获得积分0
9分钟前
Demi_Ming完成签到,获得积分10
9分钟前
大模型应助科研通管家采纳,获得30
9分钟前
9分钟前
赘婿应助Wei采纳,获得10
11分钟前
斯文败类应助科研通管家采纳,获得10
11分钟前
ronnie147完成签到 ,获得积分10
11分钟前
12分钟前
香蕉觅云应助trying采纳,获得10
13分钟前
13分钟前
trying发布了新的文献求助10
13分钟前
14分钟前
Jay完成签到 ,获得积分10
14分钟前
高分求助中
Sustainability in ’Tides Chemistry 2000
Studien zur Ideengeschichte der Gesetzgebung 1000
The ACS Guide to Scholarly Communication 1000
TM 5-855-1(Fundamentals of protective design for conventional weapons) 1000
Handbook of the Mammals of the World – Volume 3: Primates 805
Ethnicities: Media, Health, and Coping 800
Gerard de Lairesse : an artist between stage and studio 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3072613
求助须知:如何正确求助?哪些是违规求助? 2726326
关于积分的说明 7493708
捐赠科研通 2374135
什么是DOI,文献DOI怎么找? 1258905
科研通“疑难数据库(出版商)”最低求助积分说明 610394
版权声明 596983