作者
Michael A. Gillette,Shankha Satpathy,Song Cao,Saravana M. Dhanasekaran,Suhas Vasaikar,Karsten Krug,Francesca Petralia,Yize Li,Wen-Wei Liang,Boris Reva,Azra Krek,Jiayi Ji,Xiaoyu Song,Wenke Liu,Runyu Hong,Lijun Yao,Lili M. Blumenberg,Sara R. Savage,Michael C. Wendl,Bo Wen,Kai Li,Lihao Tang,Melanie A. MacMullan,Shayan C. Avanessian,M. Harry Kane,Chelsea J. Newton,MacIntosh Cornwell,Ramani B. Kothadia,Wanli Ma,Seungyeul Yoo,Rahul Mannan,Pankaj Vats,Chandan Kumar‐Sinha,Emily Kawaler,Tatiana Omelchenko,Antonio Colaprico,Yifat Geffen,Yosef E. Maruvka,Felipe da Veiga Leprevost,Maciej Wiznerowicz,Zeynep H. Gümüş,Rajwanth Veluswamy,Galen Hostetter,David I. Heiman,Matthew A. Wyczalkowski,Tara Hiltke,Mehdi Mesri,Christopher R. Kinsinger,Emily S. Boja,Gilbert S. Omenn,Arul M. Chinnaiyan,Henry Rodriguez,Qing Kay Li,Scott D. Jewell,Mathangi Thiagarajan,Gad Getz,Bing Zhang,David Fenyö,Kelly V. Ruggles,Marcin Cieślik,Ana I. Robles,Karl R. Clauser,Ramaswamy Govindan,Pei Wang,Alexey I. Nesvizhskii,Ding Li,D. R. Mani,Steven A. Carr,Alex Webster,Alicia Francis,Alyssa Charamut,Amanda G. Paulovich,Amy M. Perou,Andrew K. Godwin,Andrii Karnuta,Annette Marrero-Oliveras,Barbara Hindenach,Barbara L. Pruetz,Bartosz Kubisa,Brian J. Druker,Chet Birger,Corbin D. Jones,Dana R. Valley,Daniel C. Rohrer,Daniel Cui Zhou,Daniel W. Chan,David Chesla,David Clark,Dmitry Rykunov,Donghui Tan,Е. В. Пономарева,Elizabeth Duffy,Eric Burks,Eric E. Schadt,Erik J. Bergstrom,Eugene S. Fedorov,Ewa Malc,George Wilson,Haiquan Chen,Halina M. Krzystek,Hongwei Liu,Houston Culpepper,Hua Sun,Hui Zhang,Jacob Day,James Suh,Jeffrey R. Whiteaker,Jennifer Eschbacher,John P. McGee,Karen A. Ketchum,Karin D. Rodland,Karna Robinson,Katherine A. Hoadley,Kei Suzuki,Ki Sung Um,Kim Elburn,Liang-Bo Wang,Lijun Chen,Linda Hannick,Liqun Qi,Lori J. Sokoll,Małgorzata Wojtyś,Marcin J. Domagalski,Marina Gritsenko,Mary Beth Beasley,Matthew E. Monroe,Matthew J. Ellis,Maureen A. Dyer,Meghan C. Burke,Melissa Borucki,Menghong Sun,Michael H. A. Roehrl,Michael J. Birrer,Michael S. Noble,Michael Schnaubelt,Michael Vernon,Michelle Chaikin,Mikhail Krotevich,Munziba Khan,Myvizhi Esai Selvan,Nancy Roche,Nathan Edwards,Negin Vatanian,Olga Potapova,Pamela Grady,Peter B. McGarvey,Piotr A. Mieczkowski,Pushpa Hariharan,Rashna Madan,Ratna R. Thangudu,Richard Smith,Robert J. Welsh,Robert Zelt,Rohit Mehra,Ronald Matteotti,Sailaja Mareedu,Samuel H. Payne,Sandra Cottingham,Sanford P. Markey,Seema Chugh,Shaleigh Smith,Shirley Tsang,Shuang Cai,Simina M. Boca,Sonya Carter,Stacey Gabriel,Stephanie Young,Stephen E. Stein,Sunita Shankar,Tanya Krubit,Tao Liu,Tara Skelly,Thomas Bauer,Uma Velvulou,Umut Özbek,Vladislav Petyuk,Volodymyr Sovenko,William Bocik,William W. Maggio,Xi Chen,Yan Shi,Yige Wu,Yingwei Hu,Yuxing Liao,Zhen Zhang,Zhiao Shi
摘要
•Comprehensive LUAD proteogenomics exposes multi-omic clusters and immune subtypes•Phosphoproteomics identifies candidate ALK-fusion diagnostic markers and targets•Candidate drug targets: PTPN11 (EGFR), SOS1 (KRAS), neutrophil degranulation (STK11)•Phospho and acetyl modifications denote tumor-specific markers and druggable proteins To explore the biology of lung adenocarcinoma (LUAD) and identify new therapeutic opportunities, we performed comprehensive proteogenomic characterization of 110 tumors and 101 matched normal adjacent tissues (NATs) incorporating genomics, epigenomics, deep-scale proteomics, phosphoproteomics, and acetylproteomics. Multi-omics clustering revealed four subgroups defined by key driver mutations, country, and gender. Proteomic and phosphoproteomic data illuminated biology downstream of copy number aberrations, somatic mutations, and fusions and identified therapeutic vulnerabilities associated with driver events involving KRAS, EGFR, and ALK. Immune subtyping revealed a complex landscape, reinforced the association of STK11 with immune-cold behavior, and underscored a potential immunosuppressive role of neutrophil degranulation. Smoking-associated LUADs showed correlation with other environmental exposure signatures and a field effect in NATs. Matched NATs allowed identification of differentially expressed proteins with potential diagnostic and therapeutic utility. This proteogenomics dataset represents a unique public resource for researchers and clinicians seeking to better understand and treat lung adenocarcinomas. To explore the biology of lung adenocarcinoma (LUAD) and identify new therapeutic opportunities, we performed comprehensive proteogenomic characterization of 110 tumors and 101 matched normal adjacent tissues (NATs) incorporating genomics, epigenomics, deep-scale proteomics, phosphoproteomics, and acetylproteomics. Multi-omics clustering revealed four subgroups defined by key driver mutations, country, and gender. Proteomic and phosphoproteomic data illuminated biology downstream of copy number aberrations, somatic mutations, and fusions and identified therapeutic vulnerabilities associated with driver events involving KRAS, EGFR, and ALK. Immune subtyping revealed a complex landscape, reinforced the association of STK11 with immune-cold behavior, and underscored a potential immunosuppressive role of neutrophil degranulation. Smoking-associated LUADs showed correlation with other environmental exposure signatures and a field effect in NATs. Matched NATs allowed identification of differentially expressed proteins with potential diagnostic and therapeutic utility. This proteogenomics dataset represents a unique public resource for researchers and clinicians seeking to better understand and treat lung adenocarcinomas. Lung cancers are the leading cause of cancer deaths in the United States (Siegel et al., 2019Siegel R.L. Miller K.D. Jemal A. Cancer statistics, 2019.CA Cancer J. Clin. 2019; 69: 7-34Crossref PubMed Scopus (7327) Google Scholar) and worldwide (Bray et al., 2018Bray F. Ferlay J. Soerjomataram I. Siegel R.L. Torre L.A. Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA Cancer J. Clin. 2018; 68: 394-424Crossref PubMed Scopus (22180) Google Scholar). Despite therapeutic advances including tyrosine kinase inhibitors and immunotherapy, sustained responses are rare and prognosis remains poor (Herbst et al., 2018Herbst R.S. Morgensztern D. Boshoff C. The biology and management of non-small cell lung cancer.Nature. 2018; 553: 446-454Crossref PubMed Scopus (616) Google Scholar), with a 19% overall 5-year survival rate in the United States (Bray et al., 2018Bray F. Ferlay J. Soerjomataram I. Siegel R.L. Torre L.A. Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA Cancer J. Clin. 2018; 68: 394-424Crossref PubMed Scopus (22180) Google Scholar) and a worldwide ratio of lung cancer mortality-to-incidence of 0.87. Adenocarcinoma (LUAD), the most common lung malignancy, is strongly related to tobacco smoking but also the subtype most frequently found in individuals who have reported no history of smoking (“never-smokers”) (Subramanian and Govindan, 2007Subramanian J. Govindan R. Lung cancer in never smokers: a review.J. Clin. Oncol. 2007; 25: 561-570Crossref PubMed Scopus (460) Google Scholar; Sun et al., 2007Sun S. Schiller J.H. Gazdar A.F. Lung cancer in never smokers--a different disease.Nat. Rev. Cancer. 2007; 7: 778-790Crossref PubMed Scopus (965) Google Scholar). The genetics and natural history of LUAD are strongly influenced by smoking status, gender, and ethnicity, among other variables (Chapman et al., 2016Chapman A.M. Sun K.Y. Ruestow P. Cowan D.M. Madl A.K. Lung cancer mutation profile of EGFR, ALK, and KRAS: Meta-analysis and comparison of never and ever smokers.Lung Cancer. 2016; 102: 122-134Abstract Full Text Full Text PDF PubMed Google Scholar; Okazaki et al., 2016Okazaki I. Ishikawa S. Ando W. Sohara Y. Lung Adenocarcinoma in Never Smokers: Problems of Primary Prevention from Aspects of Susceptible Genes and Carcinogens.Anticancer Res. 2016; 36: 6207-6224Crossref PubMed Scopus (13) Google Scholar; Subramanian and Govindan, 2007Subramanian J. Govindan R. Lung cancer in never smokers: a review.J. Clin. Oncol. 2007; 25: 561-570Crossref PubMed Scopus (460) Google Scholar; Sun et al., 2007Sun S. Schiller J.H. Gazdar A.F. Lung cancer in never smokers--a different disease.Nat. Rev. Cancer. 2007; 7: 778-790Crossref PubMed Scopus (965) Google Scholar). However, contemporary large-scale sequencing efforts have typically been based on cohorts of smokers with limited ethnic diversity. Among the major sequencing studies that have helped elucidate the genomic landscape of LUAD (Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM), 2013Clinical Lung Cancer Genome Project (CLCGP)Network Genomic Medicine (NGM)A genomics-based classification of human lung tumors.Sci. Transl. Med. 2013; 5: 209ra153PubMed Google Scholar; Ding et al., 2008Ding L. Getz G. Wheeler D.A. Mardis E.R. McLellan M.D. Cibulskis K. Sougnez C. Greulich H. Muzny D.M. Morgan M.B. et al.Somatic mutations affect key pathways in lung adenocarcinoma.Nature. 2008; 455: 1069-1075Crossref PubMed Scopus (1910) Google Scholar; Imielinski et al., 2012Imielinski M. Berger A.H. Hammerman P.S. Hernandez B. Pugh T.J. Hodis E. Cho J. Suh J. Capelletti M. Sivachenko A. et al.Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing.Cell. 2012; 150: 1107-1120Abstract Full Text Full Text PDF PubMed Scopus (1132) Google Scholar), only The Cancer Genome Atlas (TCGA) measured a small subset of proteins and phosphopeptides, restricted to a 160-protein reversed phase array (Cancer Genome Atlas Research Network, 2014Cancer Genome Atlas Research NetworkComprehensive molecular profiling of lung adenocarcinoma.Nature. 2014; 511: 543-550Crossref PubMed Scopus (2470) Google Scholar). As the most frequent genomic aberrations in LUAD involve RAS/RAF/RTK pathway genes that lead to cellular transformation mainly by inducing proteomic and phosphoproteomic alterations (Cully and Downward, 2008Cully M. Downward J. SnapShot: Ras Signaling.Cell. 2008; 133: 1292-1292.e1Abstract Full Text PDF PubMed Scopus (50) Google Scholar), global proteogenomic profiling is needed to provide deeper mechanistic insights. Furthermore, although prior molecular characterization has identified a number of oncologic dependencies and facilitated the development of effective inhibitors for LUAD driven by EGFR mutation (Lynch et al., 2004Lynch T.J. Bell D.W. Sordella R. Gurubhagavatula S. Okimoto R.A. Brannigan B.W. Harris P.L. Haserlat S.M. Supko J.G. Haluska F.G. et al.Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib.N. Engl. J. Med. 2004; 350: 2129-2139Crossref PubMed Scopus (9274) Google Scholar; Paez et al., 2004Paez J.G. Jänne P.A. Lee J.C. Tracy S. Greulich H. Gabriel S. Herman P. Kaye F.J. Lindeman N. Boggon T.J. et al.EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy.Science. 2004; 304: 1497-1500Crossref PubMed Scopus (7845) Google Scholar) and ALK (Kwak et al., 2010Kwak E.L. Bang Y.-J. Camidge D.R. Shaw A.T. Solomon B. Maki R.G. Ou S.-H.I. Dezube B.J. Jänne P.A. Costa D.B. et al.Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer.N. Engl. J. Med. 2010; 363: 1693-1703Crossref PubMed Scopus (3484) Google Scholar), ROS1 (Shaw et al., 2014Shaw A.T. Ou S.-H.I. Bang Y.-J. Camidge D.R. Solomon B.J. Salgia R. Riely G.J. Varella-Garcia M. Shapiro G.I. Costa D.B. et al.Crizotinib in ROS1-rearranged non-small-cell lung cancer.N. Engl. J. Med. 2014; 371: 1963-1971Crossref PubMed Scopus (1054) Google Scholar), and RET fusions (Gautschi et al., 2017Gautschi O. Milia J. Filleron T. Wolf J. Carbone D.P. Owen D. Camidge R. Narayanan V. Doebele R.C. Besse B. et al.Targeting RET in Patients With RET-Rearranged Lung Cancers: Results From the Global, Multicenter RET Registry.J. Clin. Oncol. 2017; 35: 1403-1410Crossref PubMed Scopus (144) Google Scholar; Kohno et al., 2012Kohno T. Ichikawa H. Totoki Y. Yasuda K. Hiramoto M. Nammo T. Sakamoto H. Tsuta K. Furuta K. Shimada Y. et al.KIF5B-RET fusions in lung adenocarcinoma.Nat. Med. 2012; 18: 375-377Crossref PubMed Scopus (567) Google Scholar; Takeuchi et al., 2012Takeuchi K. Soda M. Togashi Y. Suzuki R. Sakata S. Hatano S. Asaka R. Hamanaka W. Ninomiya H. Uehara H. et al.RET, ROS1 and ALK fusions in lung cancer.Nat. Med. 2012; 18: 378-381Crossref PubMed Scopus (860) Google Scholar), a substantial proportion of LUADs still lack known or currently targetable mutations. To further our understanding of LUAD pathobiology and potential therapeutic vulnerabilities, the National Cancer Institute (NCI)’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) undertook comprehensive genomic, deep-scale proteomic, and post-translational modifications (PTM) analyses of paired (patient-matched) LUAD tumors and normal adjacent tissues (NATs). Our integrative proteogenomic analyses focused particularly on novel and clinically actionable insights revealed in the proteome and PTMs. The underlying data represent an exceptional resource for further biological, diagnostic, and drug discovery efforts. Another large-, deep-scale proteogenomics study of lung adenocarcinoma in the Taiwanese population appears in this issue (Chen et al., 2020Chen Y.-J. Roumeliotis T.I. Chang Y.-H. Chen C.-T. Han C.-L. Lin M.-H. Chen H.-W. Chang G.-C. Chang Y.-L. Wu C.-T. et al.Proteogenomics of Non-smoking Lung Cancer in East Asia Delineates Molecular Signatures of Pathogenesis and Progression.Cell. 2020; 182 (Published online July 9, 2020)https://doi.org/10.1016/j.cell.2020.06.012Abstract Full Text Full Text PDF Scopus (6) Google Scholar). We investigated the proteogenomic landscape of 110 treatment-naive LUAD tumors and 101 paired NATs, prospectively collected under strict protocols limiting ischemic time. The samples represented diverse demographic and clinical characteristics including country of origin and smoking status (Figure 1A; Table S1). After confirmation of LUAD histopathology by multiple expert pathologists, aliquots of cryopulverized tissue were profiled by whole-exome sequencing (WES, nominal 150x coverage), whole-genome sequencing (WGS, nominal 15x coverage), RNA sequencing (RNA-seq), microRNA sequencing (miRNA-seq), array-based DNA methylation analysis, and in-depth proteomic, phosphoproteomic, and acetylproteomic characterization (Figures 1B and S1A; Tables S2 and S3), with complete data for 101 tumors and 96 NATs. Tandem mass tags (TMT)-based isobaric labeling was used for precise relative quantification of proteins, phosphosites, and acetylsites. Excellent reproducibility and data quality were maintained across the entire dataset (Figures S1C–S1F). Appropriate filtering resulted in a comprehensive, deepscale proteogenomic dataset allowing extensive integrative analysis (Figure 1C; Tables S2 and S3). The general landscape of somatic alterations, focal amplifications, and deletions in this study was consistent with prior large-scale profiling efforts including TCGA (Campbell et al., 2016Campbell J.D. Alexandrov A. Kim J. Wala J. Berger A.H. Pedamallu C.S. Shukla S.A. Guo G. Brooks A.N. Murray B.A. et al.Cancer Genome Atlas Research NetworkDistinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas.Nat. Genet. 2016; 48: 607-616Crossref PubMed Scopus (422) Google Scholar; Cancer Genome Atlas Research Network, 2014Cancer Genome Atlas Research NetworkComprehensive molecular profiling of lung adenocarcinoma.Nature. 2014; 511: 543-550Crossref PubMed Scopus (2470) Google Scholar; Weir et al., 2007Weir B.A. Woo M.S. Getz G. Perner S. Ding L. Beroukhim R. Lin W.M. Province M.A. Kraja A. Johnson L.A. et al.Characterizing the cancer genome in lung adenocarcinoma.Nature. 2007; 450: 893-898Crossref PubMed Scopus (850) Google Scholar), although with a different distribution likely due to the greater demographic diversity and larger proportion of self-reported never-smokers in the current study (Figure 1D).Figure S1Experimental Workflow and Data Quality Metrics, Related to Figure 1Show full caption(A) Schematic representation showing sample processing steps. Fresh frozen tumors and their matched normal-adjacent tissues (NATs) were cryopulverized and aliquoted for genomics and proteomics analyses before undergoing comprehensive proteogenomic characterization, facilitating uniformity in input samples.(B) Schematic representation of the workflows used for proteome, phosphoproteome and acetylproteome analyses. Tandem mass tags (TMT) were used to multiplex 9 samples (4 tumors and their matched NATs, in addition to a 9th sample, an unpaired tumor) and 1 common reference (pool of all tumors and NATs) that was used to link multiple TMT10 plexes. Matched tumor / NAT pairs were included in the same TMT plex.(C) Pearson similarity matrices showing intra- and inter-plex reproducibility across 4 interspersed comparative reference (CompRef) process replicates for proteome, phosphoproteome and acetylproteome. The CompRef process replicates demonstrated excellent reproducibility (Pearson Correlation, Proteome: R = 0.91, Phosphoproteome: R = 0.88, Acetylproteome: R = 0.73) and consistent identifications across several months of data acquisition time.(D) Bar plot showing consistent numbers of identified and quantified proteins, phosphosites and acetylsites across the 25 plexes used for analyzing 212 tumors and NATs.(E) Principal component analysis (PCA) plot representation of proteome, phosphoproteome and acetylproteome separately for tumors and NATs, colored by TMT plex (n = 25). PCA was based on features that were fully quantified across all 25 TMT plexes.(F) Sample-wise Pearson correlation between copy number alteration (CNA) and RNA, and between CNA and Proteome. The dark red-colored diagonal demonstrates the absence of sample swaps.(G) Cophenetic correlation coefficient (y axis) calculated for a range of factorization ranks (x axis). The maximal cophenetic correlation coefficient was observed for rank K = 4 as shown in red.(H) Silhouette plot for K = 4. This plot indicates the quality of cluster separation.(I) Non-negative matrix factorization (NMF) clustering applied individually to proteome, phosphoproteome and acetylproteome. Each heatmap shows the maximum-normalized membership score for each sample (x axis) in each cluster (y axis) - essentially, the strength of a sample’s “belongingness” to each of the clusters. The proteome cluster overlaps substantially with the multi-omics clusters depicted in Figure 1E, but divergence is seen in both the phosphoproteome and acetylproteome, with additional substructure in the phosphoproteome. Color schematics for the different annotations and data rows are detailed in the bottom panel.(J) Louvain clustering of miRNA showed parallels with NMF results but identified five clusters. miRNA cluster 2 was markedly enriched for tumors from multi-omics cluster C1, in turn aligned with proximal-inflammatory RNA signatures, while miRNA cluster 3 was enriched for the STK11 mutant subset of the NMF C3, proximal-proliferative cluster. While the remaining three miRNA clusters had mixed composition, miRNA cluster 5 was markedly enriched for ALK fusion-driven tumors, including all 5 EML4-ALK as well as the HMBOX1-ALK fusions.View Large Image Figure ViewerDownload Hi-res image Download (PPT) (A) Schematic representation showing sample processing steps. Fresh frozen tumors and their matched normal-adjacent tissues (NATs) were cryopulverized and aliquoted for genomics and proteomics analyses before undergoing comprehensive proteogenomic characterization, facilitating uniformity in input samples. (B) Schematic representation of the workflows used for proteome, phosphoproteome and acetylproteome analyses. Tandem mass tags (TMT) were used to multiplex 9 samples (4 tumors and their matched NATs, in addition to a 9th sample, an unpaired tumor) and 1 common reference (pool of all tumors and NATs) that was used to link multiple TMT10 plexes. Matched tumor / NAT pairs were included in the same TMT plex. (C) Pearson similarity matrices showing intra- and inter-plex reproducibility across 4 interspersed comparative reference (CompRef) process replicates for proteome, phosphoproteome and acetylproteome. The CompRef process replicates demonstrated excellent reproducibility (Pearson Correlation, Proteome: R = 0.91, Phosphoproteome: R = 0.88, Acetylproteome: R = 0.73) and consistent identifications across several months of data acquisition time. (D) Bar plot showing consistent numbers of identified and quantified proteins, phosphosites and acetylsites across the 25 plexes used for analyzing 212 tumors and NATs. (E) Principal component analysis (PCA) plot representation of proteome, phosphoproteome and acetylproteome separately for tumors and NATs, colored by TMT plex (n = 25). PCA was based on features that were fully quantified across all 25 TMT plexes. (F) Sample-wise Pearson correlation between copy number alteration (CNA) and RNA, and between CNA and Proteome. The dark red-colored diagonal demonstrates the absence of sample swaps. (G) Cophenetic correlation coefficient (y axis) calculated for a range of factorization ranks (x axis). The maximal cophenetic correlation coefficient was observed for rank K = 4 as shown in red. (H) Silhouette plot for K = 4. This plot indicates the quality of cluster separation. (I) Non-negative matrix factorization (NMF) clustering applied individually to proteome, phosphoproteome and acetylproteome. Each heatmap shows the maximum-normalized membership score for each sample (x axis) in each cluster (y axis) - essentially, the strength of a sample’s “belongingness” to each of the clusters. The proteome cluster overlaps substantially with the multi-omics clusters depicted in Figure 1E, but divergence is seen in both the phosphoproteome and acetylproteome, with additional substructure in the phosphoproteome. Color schematics for the different annotations and data rows are detailed in the bottom panel. (J) Louvain clustering of miRNA showed parallels with NMF results but identified five clusters. miRNA cluster 2 was markedly enriched for tumors from multi-omics cluster C1, in turn aligned with proximal-inflammatory RNA signatures, while miRNA cluster 3 was enriched for the STK11 mutant subset of the NMF C3, proximal-proliferative cluster. While the remaining three miRNA clusters had mixed composition, miRNA cluster 5 was markedly enriched for ALK fusion-driven tumors, including all 5 EML4-ALK as well as the HMBOX1-ALK fusions. To investigate the intrinsic structure of the proteogenomics data, non-negative matrix factorization (NMF)-based unsupervised clustering was performed on RNA, protein, phosphosites, and acetylsites, collectively as “multi-omics clustering” and individually (except RNA) (Figures 1E and S1G–S1I). The four stable clusters (C1–4) (Figure 1E) overlapped with previously characterized mRNA-based proximal-inflammatory, proximal-proliferative, and terminal respiratory unit clusters (Cancer Genome Atlas Research Network, 2014Cancer Genome Atlas Research NetworkComprehensive molecular profiling of lung adenocarcinoma.Nature. 2014; 511: 543-550Crossref PubMed Scopus (2470) Google Scholar; Wilkerson et al., 2012Wilkerson M.D. Yin X. Walter V. Zhao N. Cabanski C.R. Hayward M.C. Miller C.R. Socinski M.A. Parsons A.M. Thorne L.B. et al.Differential pathogenesis of lung adenocarcinoma subtypes involving sequence mutations, copy number, chromosomal instability, and methylation.PLoS ONE. 2012; 7: e36530Crossref PubMed Scopus (122) Google Scholar) but subdivided the second of these into two distinct clusters. The core samples of the clusters were significantly associated with distinctive clinical and molecular features (p value < 0.01; Figure 1F; Table S1). Cluster 1 (C1), aligned with proximal-inflammatory, was enriched for TP53 mutants, STK11 wild type (WT), and CpG island methylator phenotype (CIMP)-high status; C2, a proximal-proliferative subcluster, was distinguished by Western patients (especially from the United States), TP53 and EGFR WT status, and intermediate CIMP status; C3, the dominant proximal-proliferative cluster, was enriched for Vietnamese patients and STK11 mutation (including two structural events identified from WGS; Table S1); and C4, aligned with terminal respiratory unit, was enriched for EGFR mutations, female sex and Chinese nationality, and was essentially devoid of KRAS or STK11 mutations. Most of the samples harboring EML4-ALK fusions were assigned to C4 and lacked mutations in other key driver genes, consistent with a primary role for EML4-ALK in LUAD tumorigenesis (Gao et al., 2018Gao Q. Liang W.-W. Foltz S.M. Mutharasu G. Jayasinghe R.G. Cao S. Liao W.-W. Reynolds S.M. Wyczalkowski M.A. Yao L. et al.Fusion Analysis Working GroupCancer Genome Atlas Research NetworkDriver Fusions and Their Implications in the Development and Treatment of Human Cancers.Cell Rep. 2018; 23: 227-238.e3Abstract Full Text Full Text PDF PubMed Scopus (129) Google Scholar). Of note, NMF clustering based on sample purity-adjusted protein data matrices led to similar clusters compared to the unadjusted data. Although NMF clusters had distinctive biology, linear models did not identify biologically coherent sets of differential markers between sexes, tumor stages, or histological subtypes once major covariates were accounted for (Table S3). To further explore the biology associated with the multi-omics taxonomy, we performed over-representation pathway analysis (Zhang et al., 2016Zhang H. Liu T. Zhang Z. Payne S.H. Zhang B. McDermott J.E. Zhou J.-Y. Petyuk V.A. Chen L. Ray D. et al.CPTAC InvestigatorsIntegrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.Cell. 2016; 166: 755-765Abstract Full Text Full Text PDF PubMed Scopus (362) Google Scholar) using differentially regulated genes, proteins, and post-translational modifications (PTMs) in each of the clusters (Figure 1E; Table S3). C1/proximal-inflammatory samples were primarily associated with immune signaling across multiple data types. The C2 subset of the proximal-proliferative subtype demonstrated signaling by Rho GTPases, as well as signatures of hemostasis and platelet activation, signaling, and degranulation, suggestive of systematic disturbances in coagulation homeostasis. The dominant proximal-proliferative subtype in C3 had a distinctive histone deacetylase signature but also an upregulation of cell cycle pathways. Finally, the terminal respiratory unit subtype in C4 was distinguished by surfactant metabolism, MAPK1/MAPK3 signaling, MECP2 regulation, and chromatin organization in the acetylproteome. Notably, C1, characterized by increased expression of immune system-related genes, included samples with high non-synonymous mutation burden and CIMP-high status. Altogether, the pathway enrichment analysis highlights intrinsic differences in both oncogenic signaling and host response across LUAD subtypes. To explore the pattern of miRNA expression in LUAD, we performed unsupervised Louvain clustering of 107 tumor samples with available miRNA data based on expression of mature miRNAs. Five subgroups of LUAD patients were identified by their distinctive miRNA expression profiles (Figure S1J; Table S3). Two of the miRNA clusters were markedly enriched for tumors from C1/proximal-inflammatory and C3/proximal-proliferative multi-omics clusters, whereas the remaining three miRNA clusters had mixed composition. One miRNA cluster included all five EML4-ALK as well as the HMBOX1-ALK fusion tumors and featured high expression of miR-494, miR-495, and miR-496, the first two previously implicated in non-small cell lung cancer (NSCLC) (Romano et al., 2012Romano G. Acunzo M. Garofalo M. Di Leva G. Cascione L. Zanca C. Bolon B. Condorelli G. Croce C.M. MiR-494 is regulated by ERK1/2 and modulates TRAIL-induced apoptosis in non-small-cell lung cancer through BIM down-regulation.Proc. Natl. Acad. Sci. USA. 2012; 109: 16570-16575Crossref PubMed Scopus (137) Google Scholar; Chen et al., 2017Chen H. Wang X. Bai J. He A. Expression, regulation and function of miR-495 in healthy and tumor tissues.Oncol. Lett. 2017; 13: 2021-2026Crossref PubMed Scopus (16) Google Scholar). The vast majority of patients with STK11 mutations were categorized into another subgroup in which well-documented cancer-associated miRNAs such as miR-106b-5p, miR-20a-5p, and miR-17-5p were highly expressed (Lu et al., 2017Lu J. Wei J.-H. Feng Z.-H. Chen Z.-H. Wang Y.-Q. Huang Y. Fang Y. Liang Y.-P. Cen J.-J. Pan Y.-H. et al.miR-106b-5p promotes renal cell carcinoma aggressiveness and stem-cell-like phenotype by activating Wnt/β-catenin signalling.Oncotarget. 2017; 8: 21461-21471Crossref PubMed Scopus (26) Google Scholar; Shi et al., 2018Shi D.-M. Bian X.-Y. Qin C.-D. Wu W.-Z. miR-106b-5p promotes stem cell-like properties of hepatocellular carcinoma cells by targeting PTEN via PI3K/Akt pathway.OncoTargets Ther. 2018; 11: 571-585Crossref PubMed Scopus (23) Google Scholar). The relationships between epigenetic and genomic events and downstream expression of RNA, proteins, and PTMs were explored in detail. Cross-referencing gene fusions in the cohort with a curated kinase fusion database (Gao et al., 2018Gao Q. Liang W.-W. Foltz S.M. Mutharasu G. Jayasinghe R.G. Cao S. Liao W.-W. Reynolds S.M. Wyczalkowski M.A. Yao L. et al.Fusion Analysis Working GroupCancer Genome Atlas Research NetworkDriver Fusions and Their Implications in the Development and Treatment of Human Cancers.Cell Rep. 2018; 23: 227-238.e3Abstract Full Text Full Text PDF PubMed Scopus (129) Google Scholar) allowed identification of all rearrangements involving kinases (Figure 2A). Although fusions involving ALK, ROS1, RET, and PTK2 genes were most recurrent, several novel, potentially oncogenic kinase fusions were also discovered. Generally, such oncogenic kinases contained in-frame fusions, whereas kinases with a tumor suppressive role (such as STK11, STK4, ATM, FRK, and EPHA1) exhibited disruptive out-of-frame events (Figure 2A). Several kinase fusions showed commensurate differential RNA, protein, and phosphosite expression of the index cases (Figure 2B). Besides ALK, instances of ROS1, RET, PRKDC, and PDGFRA overexpression were found in tumors but not in paired NAT samples. Investigation of the fusion architecture of the highly recurrent in-frame ALK gene fusions (n = 7) identified multiple 5′ partners including the well-established EML4 as well as novel HMBOX1 and ANKRD36B genes (Figure S2A). WGS data provided precise genomic breakpoints in the intron proximal to exon-20 (e20) underlying ALK rearrangements in five cases (Figure S2B). All ALK gene fusion cases showed outlier expression of ALK mRNA, and all in which the protein was detected (4/7) showed outlier ALK total protein abundance. However, the most dramatic difference was seen in the specific increase in ALK phosphosite Y1507 (Figure 2C). While RNA expression levels of the 5′ partner genes were uniformly high and did not differ between fusion-positive and -negative samples (Figure 2D), both EML4-Y226 and HMBOX-S141 showed increased phosphorylation only in the corresponding gene fusion-positive tumor samples (Figure 2E). We employed immunohistochemistry (IHC) to validate observation of the fusion-specific ALK phosphosite Y1507 using commercially available ALK and phospho (Y1507) ALK antibodies. We noted tumor-specific positive staining in all available ALK fusion-positive cases, whereas no detectable staining was observed in either samples with ROS1/RET fusions or paired NATs (Figures 2F and S2C). To assess phosphorylation of canonical and possible novel targets by mislocalized ALK fusion proteins (Ducray et al., 2019Ducray S.P. Natarajan K. Garland G.D. Turner S.D. Egger G. The Transcriptional Roles of ALK Fusion Proteins in Tumorigenesis.Cancers (Basel). 2019; 11: 1074Crossref Scopus (6) Google Scholar), we identified all protein phosphorylation events associated with ALK fusion. This analysis identified tyrosine phosphorylation of multiple proteins such as SND1, HDLBP, and ARHGEF5 (Figure 2G), providing new potential insights into oncogenic ALK fusion protein signaling, pending further validation to establish direct functional connections. SND1, for instance, has previously been described as an oncogene (Jariwala et al., 2017Jariwala N. Raja