摘要
Lynch syndrome (LS) is characterized by the development of mismatch repair–deficient (dMMR) colorectal and endometrial cancers at a young age in life. LS is caused by germline pathogenic variants (PVs) in 1 of the MMR genes MLH1, MSH2, MSH6, or PMS2 or deletions affecting the 3′ region of EPCAM.1Lynch H.T. et al.Clin Genet. 2009; 76: 1-18Crossref PubMed Scopus (613) Google Scholar Current germline diagnostics for LS include targeted short-read sequencing and multiplex ligation-dependent probe amplification of the coding regions of the MMR genes, after the exclusion of somatic MLH1-promoter hypermethylation. In the absence of a germline PV in an MMR gene, the presence of somatic dMMR is investigated. However, a proportion of individuals with dMMR tumors remain genetically unresolved after germline and somatic analyses. These individuals have an unexplained dMMR tumor and are known as individuals with Lynch-like syndrome (LLS) (Figure 1A).2Elze L. et al.Gastroenterology. 2021; 160: 1414-1416Abstract Full Text Full Text PDF PubMed Scopus (4) Google Scholar,3Mensenkamp A.R. Vogelaar I.P. et al.Gastroenterology. 2014; 146: 643-646Abstract Full Text Full Text PDF PubMed Scopus (252) Google Scholar For individuals with LLS and their relatives, treatment options and surveillance are yet unclear. Here, by applying targeted long-read sequencing of the MMR genes, we show that a substantial proportion of individuals with LLS actually can be diagnosed with LS, because they carry deep intronic MMR gene aberrations that result in aberrant splicing (for further reading on aberrant splicing, see Lord et al.4Lord J. et al.Front Genet. 2021; 12689892Crossref PubMed Scopus (11) Google Scholar). The study cohort consisted of 32 individuals diagnosed with an unexplained dMMR cancer (Figure 1A) aged ≤70 years with (n = 18) or without (n = 14) a familial history of colorectal or endometrial cancer. For each individual, the MMR gene(s) of interest (Supplementary Table 1) was amplified using long-range polymerase chain reaction amplicons of 7–18 kb in size (Supplementary Figure 1A) on germline DNA and sequenced using Single Molecule Real-Time Sequencing (Pacific Biosciences). For details on sequencing and annotation see Supplementary Methods. This study (CMO-2018-4922) was approved by the Radboudumc Ethical Committee. Nine different noncoding aberrations in 9 of 32 individuals (28.1%) were identified (Figure 1B). Five different deep intronic single nucleotide variants in MSH2 (n = 3), MLH1 (n = 1), and PMS2 (n = 1) that are likely to introduce novel splice sites based on in silico predictions were identified in 6 individuals. Additionally, 2 intronic Alu element insertions (>96% similarity to Alu elements) in introns 1 and 8 of MSH2 (of which the latter was in cis with a deep-intronic single nucleotide variant), 1 1704-bp intronic deletion in intron 3 of MLH1, and 1 tandem duplication in the MLH1 promoter region were identified. None of the variants was previously reported in population databases (GnomAD-G, 1000G). We performed co-segregation analyses where possible and analyzed the effects of 7 potentially pathogenic noncoding variants in a mini-gene assay to assess altered splicing and the effect of the duplication in the MLH1 promoter region using a luciferase reporter assay. Individual LLP004 carried 2 deep intronic MSH2 variants: c.793-603C>T and c.2458+976A>G. Co-segregation analysis showed that c.793-603C>T did not co-segregate with cancer phenotypes within the family and therefore was excluded for functional analysis, whereas c.2458+976A>G did segregate in 2 affected cousins and an affected aunt (Figure 1C, family A). Quantitative analysis of MSH2 mRNA from a lymphoblastoid cell line from an affected cousin showed nonsense-mediated decay and decreased MSH2 expression compared with healthy control samples (Supplementary Figure 1D). The same MSH2 c.2458+976A>G variant was also found in individual LLP009, who is as far as we could determine not related to LLP004. Mini-gene analysis of c.2458+976A>G showed alternative splicing because of the generation of a new splice donor site and activation of an intronic splice acceptor site leading to a premature stop codon at p.(Gly820Glufs∗47). Individual LLP031 carried an Alu element insertion (c.1387-3546_1387-3545ins351) and a deep intronic variant (c.2459-954A>G) in cis in MSH2. Both variants co-segregated in the sister and the mother of the index (Figure 1C, family B). Mini-gene analysis of MSH2 c.1387-3546_1387-3545ins351 showed predominant expression of the wild-type transcript but also some transcripts of alternative size (Supplementary Figure 1B). However, because the mini-gene analysis of the c.2459-954A>G variant showed the inclusion of a pseudoexon (68 bp) between exons 14 and 15 that led to a premature stop codon at p.(Gly820Glufs∗44), we considered the latter as the PV in this individual. Co-segregation analysis was not possible for the other LLS individuals. Therefore, the remaining potential PVs were only analyzed by mini-gene analysis or a luciferase reporter assay. Mini-gene analysis of the Alu element insertion affecting the splice acceptor consensus sequence of MSH2 exon 2 (c.212-4_213-3ins366; individual LLP024) and the deep intronic variants MLH1 c.306+1001_307-642delinsTA (individual LLP032) and MLH1 c.306+1070C>G (individual LLP002) induced altered splicing compared with wild-type and all resulted in predicted premature stop codons (Figure 1D, Supplementary Figure 1B and C). The effect of the 48-bp tandem duplication in the MLH1 promoter in individual LLP025 (MLH1 c.-404_-357dup) was assessed using a luciferase reporter assay (Supplementary Methods).5Hitchins M.P. et al.Cancer Cell. 2011; 20: 200-213Abstract Full Text Full Text PDF PubMed Scopus (133) Google Scholar However, the MLH1 c.[-404_-357dup;-93G>A] haplotype of the individual showed a similar level of promoter activity compared with the wild-type MLH1 promoter (Supplementary Figure 1E). In addition, germline MLH1 promoter hypermethylation was not observed. Therefore, this duplication is currently considered as a variant of unknown significance. Mini-gene analysis of the deep intronic PMS2 c.2276-400G>C variant, identified in individuals LLP014 and LLP029, did not indicate altered splicing compared with wild-type. Until now, deep intronic PVs, MLH1 promoter variants, and Alu-mediated structural variants have only been reported in isolated cases or in very small proportions of LLS cohorts.5Hitchins M.P. et al.Cancer Cell. 2011; 20: 200-213Abstract Full Text Full Text PDF PubMed Scopus (133) Google Scholar, 6Clendenning M. et al.Fam Cancer. 2011; 10: 297-301Crossref PubMed Scopus (39) Google Scholar, 7Li L. et al.Hum Mutat. 2006; 27: 388Crossref PubMed Scopus (41) Google Scholar, 8Ward R.L. et al.Genet Med. 2013; 15: 25-35Abstract Full Text Full Text PDF PubMed Scopus (60) Google Scholar, 9Arnold A.M. et al.Eur J Hum Genet. 2020; 28: 597-608Crossref PubMed Scopus (6) Google Scholar However, our analyses indicate that a substantial proportion of individuals with LLS (18.8%; 6/32 individuals) can be diagnosed with LS because of a germline noncoding pathogenic aberration in an MMR gene. These findings warrant the expansion of current diagnostic short-read–sequencing panels for known pathogenic intronic variants to increase the diagnostic yield for LS. Moreover, for individuals with LLS who remain without a genetic diagnosis after both germline and somatic routine diagnostic analyses, it is of interest to perform germline sequencing of the complete MMR gene loci. In such an approach, long-read sequencing may facilitate the detection of deep intronic variants not covered by current diagnostic panels and allow the detection of Alu or tandem repeats, which are very difficult to detect using short-read–sequencing technologies. Together, analyses of the intronic regions of the MMR genes further optimizes LS diagnostics and consequently improves treatment and cancer surveillance in patients and relatives. The authors thank all study participants. The authors thank Neeltje Arts, Edris Askar, Anouk Bertram, Linske de Bruijn, Ronny Derks, Michael Kwint, Mieke Lutje-Berenbroek, Luke O’Gorman, Bruce Poppe, Robin de Putter, Hanneke Volleberg-Gorissen, Marcel Nelen, and Lisenka Vissers for their contributions to this project. Furthermore, the authors want to thank the Radboudumc Genome Technology Center for infrastructural and computational support. The ERN-GENTURIS Lynch-like Working Group (alphabetical order, grouped per hospital) includes Stéphanie Baert-Desurmont,1 Kathleen B. M. Claes,2 Kim de Leeneer,2 Lisa Elze,3 Simone van den Heuvel,3 Rachel S. van der Post,4 Yvonne van Twuijver,3 Tjakko J. van Ham,5 Anja Wagner,5 Mirjam M. de Jong,6 Edward M. Leter,7 Maartje Nielsen,8 from the 1Department of Genetics, Normandy Center for Genomic and Personalized Medicine, UNIROUEN, Inserm U1245 and Rouen University Hospital, Rouen, France; 2Centre for Medical Genetics, Ghent University Hospital, Department of Biomolecular Medicine, Ghent University, Medical Genetics, Ghent, Belgium; 3Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands; 4Department of Pathology, Radboud University Medical Center, Radboud Institute for Molecular Life Sciences, Nijmegen, the Netherlands; 5Department of Clinical Genetics, Erasmus University Medical Center, Erasmus MC Cancer Institute, Rotterdam, the Netherlands; 6Department of Clinical Genetics, University Medical Center Groningen, Groningen, the Netherlands; 7Department of Clinical Genetics, Maastricht University Medical Center +, Maastricht, the Netherlands; and 8Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands. Iris B. A. W. te Paske, MSc (Data curation: Equal; Formal analysis: Equal; Investigation: Equal; Validation: Equal; Writing – original draft: Equal; Writing – review & editing: Equal). Arjen R. Mensenkamp, PhD (Data curation: Equal; Formal analysis: Equal; Investigation: Equal; Validation: Equal; Writing – review & editing: Equal). Kornelia Neveling, PhD (Data curation: Equal; Formal analysis: Equal; Writing – review & editing: Equal). Nicoline Hoogerbrugge, MD, PhD (Conceptualization: Equal; Formal analysis: Equal; Investigation: Equal; Supervision: Equal; Validation: Equal; Writing – review & editing: Equal). Marjolijn J. L. Ligtenberg, PhD (Conceptualization: Equal; Formal analysis: Equal; Investigation: Equal; Supervision: Equal; Validation: Equal; Writing – review & editing: Equal). Richarda M. de Voer, PhD (Conceptualization: Equal; Formal analysis: Equal; Investigation: Equal; Supervision: Equal; Writing – original draft: Equal; Writing – review & editing: Equal). The study cohort included 32 individuals without a germline PV and without somatic dMMR in 1 of the MMR genes, without MLH1 promoter methylation, and with at most 1 somatically inactivated allele (Figure 1A). Germline and somatic variants of unknown significance (class 3 variants) were considered as not explanatory. Inclusion was prioritized based on the presence of a LS-associated cancer in persons aged ≤ 70 years (Supplementary Table 1). For every individual, immunohistochemistry results and somatic sequencing results were taken into consideration to decide which genes should be amplified. For each MMR gene (transcripts used for design: MLH1:NM_000249.4; EPCAM-MSH2:NM_002354.3-NM_000251.3; MSH6:NM_000179.3; PMS2:NM_000535.7) long-range polymerase chain reaction (PCR) amplicons were designed ranging from 7 to 18 kb, with a 1-kb overlap between amplicons (Supplementary Figure 1A). For every amplicon, leukocyte-derived genomic DNA (gDNA) was amplified on a ProFlex PCR system (Thermo Fisher Scientific) using LongAmp Hot Start Taq 2x master mix (New England Biolabs) (protocol and primer sequences are available on request). PCR products were checked on 0.8% agarose gel, and concentrations were measured by Qubit HS dsDNA Assay (Thermo Fisher) before equimolar pooling per individual. Library preparation was performed according to the protocol Procedure and Checklist—Preparing Single Molecule Real-time (SMRT)bell Libraries using PacBio Barcoded Adapters for Multiplex SMRT Sequencing (Pacific Biosciences). The generation of polymerase-bound SMRTbell complexes was performed using the Sample Setup option in SMRTLink (Pacific Biosciences), and the SMRTbell complex was loaded onto an SMRTcell and sequenced either on a Sequel I or IIe system. Circular consensus long-reads meeting quality control (QC) metrics ≥ 20 (≤1% error rate) were mapped against GRCH37/hg19 within SMRTlink. For single nucleotide variant and small indel detection, mapped bam files were loaded into JSI SeqNext Software v5.1.0 Build 503 (JSI Medical Systems GmbH) to perform quality filtering and variant calling. Variant calling format (VCF) files containing distinct variants (≥20% coverage per direction) were annotated using an in-house annotation pipeline and were filtered for GnomAD-G allele frequency < 0.1% and in-house database frequency (>26,000 alleles) < 0.1%. Coding variants and variants with a SpliceAI delta score > 0.1 were included for (re)evaluation. Coding variants located upstream of EPCAM exon 8 were excluded. Noncoding variants were filtered for variant allele frequency ≥ 35% (correction for mapping difficulties in mono-nucleotide repeats) and selected if there was a SpliceAI delta score > 0.1.1Jaganathan K. et al.Cell. 2019; 176: 535-548Abstract Full Text Full Text PDF PubMed Scopus (645) Google Scholar All bam files were also visually assessed using the Integrative Genomics Viewer v.2.10/12 (Broad Institute), with settings to flag supplementary aligned reads and flag indels (>10 bp) to identify structural variants. Co-segregation analysis was performed by Sanger sequencing of gDNA isolated from leukocytes or formalin-fixed paraffin-embedded tissue of the index individual and available family members. For each in silico predicted likely pathogenic variant, a mini-gene construct was generated as previously described by Sangermano et al.2Sangermano R. et al.Ophthalmology. 2016; 123: 1375-1385Abstract Full Text Full Text PDF PubMed Scopus (69) Google Scholar In short, gDNA or the corresponding long-range PCR product from the affected individual or reference gDNA was amplified using specific primers that were located in the genomic regions upstream and downstream of the exons that were flanking the specific variant. The product was cloned into a pDONR201 vector by means of Gateway Cloning (Thermo Fisher). When reference gDNA was used, site-directed mutagenesis was performed to introduce the variant of interest. Subsequently, wild-type and mutant constructs were cloned into the pCI-NEO-RHOexon3,5/DEST. Expression vectors were transfected using Fugene6 (Promega) in HEK293T and HCT116 cell lines. Cells were harvested 48 hours after transfection for total mRNA isolation using the RNeasy Mini kit (Qiagen). Complementary DNA (cDNA) was generated of mRNA using the iScript cDNA Synthesis kit (Bio-Rad), followed by reverse transcriptase (RT) PCR amplification of the region of interest using primers located in RHO exon 3 and exon 5. RT-PCR products were run on agarose gel. Bands were purified from gel and analyzed by Sanger sequencing. mRNA was isolated using the QIAamp RNA Blood-Kit (Qiagen) according to the manufacturer’s protocol from cultured peripheral blood lymphocytes that were cultured in the presence and absence of cycloheximide as previously described by Weren et al.3Weren R.D. et al.Nat Genet. 2015; 47: 668-671Crossref PubMed Scopus (256) Google Scholar Total cDNA was generated as mentioned above. For transcript quantification, 5 μL of cDNA (concentration, 0.2 ng/μL) was mixed, with GoTaq qPCR Master Mix (Promega), according to manufacturer’s protocol. To determine MSH2 expression, primers targeting MSH2 exons 2–3, 6–7, and 10–11 were used. Real-time quantitative RT-PCR was performed on a 7500 Fast Real-Time PCR system (Applied Biosystems) with HPRT1 as control. Data represent the mean ± SD of 3 replicates. Statistical significance was determined using the 2-tailed unpaired Welch's t-test. The MLH1 promoter region c.−513 to c.-1 was amplified from gDNA of the individual affected with the MLH1 c.-404_-357dup variant and from reference gDNA and cloned into the pGL3-Basic-GW vector containing a firefly luciferase cassette by means of Gateway Cloning. Per transfection, 50 ng firefly reporter plasmids containing wild-type, positive control (c.-27C>A; associated with decreased luciferase activity4Hitchins M.P. et al.Cancer Cell. 2011; 20: 200-213Abstract Full Text Full Text PDF PubMed Scopus (134) Google Scholar), LLS individual wild-type allele (c.-93G>A), LLS individual mutant allele (c.[-404_-357dup;-93G>A]), and duplication only (c.-404_-357dup), an empty vector, and pGL3-SV40 promoter (data not shown) were cotransfected with 5 ng pRL-SV40 Renilla luciferase reporter vector (Promega) into HEK293T and HCT116 cells using FuGene6. After 48 hours cells were harvested and lysed and luciferase activity was measured by Dual-Luciferase Reporter Assay kit (Promega), according to manufacturer’s protocol. Firefly luciferase units were normalized with Renilla luciferase units. Grubbs test was applied (alpha 0.1) to remove outliers. Data represent the mean ± SD of 3 replicates. Statistical significance was determined using the 2-tailed unpaired t-test.Supplementary Table 1Characteristics of the Study CohortIndividualgenderDiagnosis and ageFamily history of CRC/ECImmunohistochemistry resultGermline variant found in diagnosticsGenes testedSomatic variant (class 4/5)Somatic variant (class 3)Somatic variantsGermline variant found in this studyLLP001FCRC23, POL32YMLH1/PMS2 negativeNMLH1MLH1 LOHNOneNLLP002FKC53, CRC58UMLH1 negative,PMS2 positiveNMLH1MLH1 LOHNOneSNV (deep intronic): MLH1 c.306+1070C>GLLP003MCRC66N:POLMLH1/PMS2 negativeNMLH1MLH1 LOHNOneNLLP004MCRC33YMSH2/MSH6 negativeNMSH2MSH2 c.942+3A>T; no indication for LOHNOneSNVs (deep intronic): MSH2 c.793-603C>T; MSH2 c.2458+976A>GLLP005FEC50YMSH2/MSH6 negativeNMSH2MSH2 c.211+2delT; no indication for LOHNOneNLLP006FEC52UMSH6 negativeNMSH6MSH6 c.3261dup: p.(Phe1088fs); no indication for LOHNOneNLLP007MCRC59YMSH6 negativeNMSH6MSH6 LOHNOneNLLP008MCRC51NPMS2 weak positivec.2533C>G; p.(His845Asp) (Class 3)PMS2PMS2 c.780del: p.(Asp261∗); no indication for LOHNOneNLLP009MUC38, CRC44NMSH2/MSH6 negativeNMSH2No hits in somatic analysis, no indication for LOHNNoneSNV (deep intronic): MSH2 c.2458+976A>GLLP010MCRC25YMLH1/PMS2 negativeNMLH1MLH1 LOHMLH1 c.193G>T: p.(Gly65Cys)OneNLLP011MCRC37NMLH1/PMS2 negativeNMLH1MLH1 c.884+4A>G; no indication for LOHMLH1 c.922C>T: p.(His308Tyr)OneNLLP012FCRC41N:POLMLH1/PMS2 negativeNMLH1SNPs do not show LOHMLH1 c.1585T>C: p.(Ser529Pro), MLH1 c.1919C>T: p.(Pro640Leu)NoneNLLP013MCRC43NMSH2/MSH6 weak positiveNMSH2 and MSH6MSH6 c.3119_3120del: p.(Phe1040∗); no indication for LOHMSH6 c.1153_1155del: p.(Arg385del)OneNLLP014FCRC31YPMS2 negative,MSH6 partly positiveNPMS2PMS2 c.486del: p.(Leu162fs); no indication for LOHNOneSNV (deep intronic): PMS2 c.2276-400G>CLLP015MCRC35NMSH2/MSH6 negativeNMSH2MSH2 LOHNOneNLLP016FCRC46YMLH1/PMS2 negativeNMLH1MLH1 LOHMLH1 c.977T>A: p.(Val326Glu)OneNLLP017MCRC43NMSH2/MSH6 negativeNMSH2Chr2(GRCh37):g.(?_47630331)_(47710089_?)NOneNLLP018FCRC48YMSH2/MSH6 negativeNMSH2 and MSH6MSH6 c.3613_3615del: p.(Thr1205del); no indication for LOHMSH6 c.3261dup: p.(Phe1088fs) (Low variant allele frequency, likely due to MSI)NoneNLLP019MCRC43NMSH2/MSH6 negativeNMSH2MSH2 c.1901T>G: p.(Leu634∗)NOneNLLP020MCRC30NMSH2 negativeNMSH2MSH2 LOHMSH2 c.2459-11A>GOneNLLP021FCRC43YNot performedNMLH1MLH1 LOHMLH1 c.2059C>T: p.(Arg687Trp)OneNLLP022FIC43YMSH6 negativeNMSH6MSH6 LOHNOneNLLP023FCRC38YMSH2/MSH6 negativeNMSH2MSH2 LOHNOneNLLP024FCRC32UMSH2/MSH6 negativeNMSH2No indication for LOHNNoneAlu insertion (intronic): MSH2 c.212-4_212-3ins366LLP025MCRC50YMLH1/PMS2 negativeNMLH1Too little informative SNPsNNoneTandem duplication (promoter region): MLH1 c.-404_-357dupLLP026FCRC47YMLH1/PMS2 negativeNMLH1 and MSH2aMSH2 c.802del; p.(Ser268HisfsTer6) found in tumor with allele frequency of 63% and 90% tumor cell percentage.MSH2 c.802del: p.(Ser268HisfsTer6); no indication for LOH in MLH1NOneNLLP027MCRC40YMSH6 negativeNMSH6No indication for LOHMSH6 c.2295C>G: p.(Cys765Trp) (variant allele frequency 50%)NoneNLLP028MCRC57YMSH2/MSH6 negativeNMSH2MSH2 LOHNOneNLLP029MBDC61YMLH1/PMS2 negativeNMLH1 and PMS2bSuggestive LOH in PMS2 found.MLH1 c.(?_-1)_(1731+1_1732-1)delNOneSNV (deep intronic): PMS2 c.2276-400G>CLLP030MCRC62YMLH1/PMS2 negativeNMLH1MLH1 c.1838_1854delNOneNLLP031MCRC46YMSH2/MSH6 negativeNMSH2No indication for LOHNNoneAlu insertion (intronic): MSH2 c.1387-3546_1387-3545ins351; SNV (deep intronic): MSH2 c.2459-954A>GLLP032MCRC43UMLH1/PMS2 negativeNMLH1No indication for LOHNNoneDeletion (intronic): MLH1 c.306+1001_307-642delinsTAImmunohistochemistry and somatic variants are of the tumor are underlined. BDC, bile duct cancer; EC, endometrium cancer; LOH, loss of heterozygosity; IC, ileocecal cancer; KC, kidney cancer; MSI, microsatellite instability; N, no; POL, polyps; SNPs, single nucleotide polymorphisms; SNV, single nucleotide variant; UC, urothelial cancer; U, unknown; Y, yes; ?, unknown age at diagnosis.a MSH2 c.802del; p.(Ser268HisfsTer6) found in tumor with allele frequency of 63% and 90% tumor cell percentage.b Suggestive LOH in PMS2 found. Open table in a new tab Immunohistochemistry and somatic variants are of the tumor are underlined. BDC, bile duct cancer; EC, endometrium cancer; LOH, loss of heterozygosity; IC, ileocecal cancer; KC, kidney cancer; MSI, microsatellite instability; N, no; POL, polyps; SNPs, single nucleotide polymorphisms; SNV, single nucleotide variant; UC, urothelial cancer; U, unknown; Y, yes; ?, unknown age at diagnosis.