摘要
The extracellular matrix (ECM) is a complex meshwork of cross-linked proteins providing both biophysical and biochemical cues that are important regulators of cell proliferation, survival, differentiation, and migration. We present here a proteomic strategy developed to characterize the in vivo ECM composition of normal tissues and tumors using enrichment of protein extracts for ECM components and subsequent analysis by mass spectrometry. In parallel, we have developed a bioinformatic approach to predict the in silico "matrisome" defined as the ensemble of ECM proteins and associated factors. We report the characterization of the extracellular matrices of murine lung and colon, each comprising more than 100 ECM proteins and each presenting a characteristic signature. Moreover, using human tumor xenografts in mice, we show that both tumor cells and stromal cells contribute to the production of the tumor matrix and that tumors of differing metastatic potential differ in both the tumor- and the stroma-derived ECM components. The strategy we describe and illustrate here can be broadly applied and, to facilitate application of these methods by others, we provide resources including laboratory protocols, inventories of ECM domains and proteins, and instructions for bioinformatically deriving the human and mouse matrisome. The extracellular matrix (ECM) is a complex meshwork of cross-linked proteins providing both biophysical and biochemical cues that are important regulators of cell proliferation, survival, differentiation, and migration. We present here a proteomic strategy developed to characterize the in vivo ECM composition of normal tissues and tumors using enrichment of protein extracts for ECM components and subsequent analysis by mass spectrometry. In parallel, we have developed a bioinformatic approach to predict the in silico "matrisome" defined as the ensemble of ECM proteins and associated factors. We report the characterization of the extracellular matrices of murine lung and colon, each comprising more than 100 ECM proteins and each presenting a characteristic signature. Moreover, using human tumor xenografts in mice, we show that both tumor cells and stromal cells contribute to the production of the tumor matrix and that tumors of differing metastatic potential differ in both the tumor- and the stroma-derived ECM components. The strategy we describe and illustrate here can be broadly applied and, to facilitate application of these methods by others, we provide resources including laboratory protocols, inventories of ECM domains and proteins, and instructions for bioinformatically deriving the human and mouse matrisome. The extracellular matrix (ECM) 1The abbreviations used are:ECMExtracellular matrixLC-MS/MSLiquid chromatography-Tandem Mass SpectrometryOGEOff-gel Electrophoresis. 1The abbreviations used are:ECMExtracellular matrixLC-MS/MSLiquid chromatography-Tandem Mass SpectrometryOGEOff-gel Electrophoresis. is a fundamental and important component of metazoan organisms providing architectural support and anchorage for the cells. The ECM consists of a complex meshwork of highly cross-linked proteins and exists as interstitial forms within organs and as specialized forms, such as basement membranes underlying epithelia, vascular endothelium, and surrounding certain other tissues and cell types (e.g. neurons, muscles). Cells adhere to the ECM via transmembrane receptors, among which integrins are the most prominent (1Hynes R.O. Integrins: bidirectional, allosteric signaling machines.Cell. 2002; 110: 673-687Abstract Full Text Full Text PDF PubMed Scopus (6852) Google Scholar, 2van der Flier A. Sonnenberg A. Function and interactions of integrins.Cell Tissue Res. 2001; 305: 285-298Crossref PubMed Scopus (808) Google Scholar). These cell-matrix interactions result in the stimulation of various signaling pathways controlling proliferation and survival, differentiation, migration, etc. The composition of the ECM and the repertoire of ECM receptors determine the responses of the cells. The biophysical properties of the ECM (deformability or stiffness) have also been shown to modulate these cellular functions (3Frantz C. Stewart K.M. Weaver V.M. The extracellular matrix at a glance.J. Cell Sci. 2010; 123: 4195-4200Crossref PubMed Scopus (2358) Google Scholar, 4Schwartz M.A. Integrins and extracellular matrix in mechanotransduction Extracellular Matrix Biology.in: Hynes R.O. Yamada K.M. Cold Spring Harb Perspect. Biol. 201010.1101/cshperspect.a005066Google Scholar). In addition to core ECM components (fibronectins, collagens, laminins, proteoglycans, etc.), the ECM serves as a reservoir for growth factors and cytokines and ECM-remodeling enzymes that collaborate with ECM proteins to signal to the cells (5Hynes R.O. The extracellular matrix: not just pretty fibrils.Science. 2009; 326: 1216-1219Crossref PubMed Scopus (2290) Google Scholar, 6Mott J.D. Werb Z. Regulation of matrix biology by matrix metalloproteinases.Curr. Opin. Cell Biol. 2004; 16: 558-564Crossref PubMed Scopus (872) Google Scholar). Hence, the ECM provides not only biophysical cues but also biochemical cues that regulate cell behavior. In addition to being important for normal development, alterations of the ECM have been associated with various pathologies such as fibrosis, skeletal diseases, and cancer (7Nelson C.M. Bissell M.J. Of extracellular matrix, scaffolds, and signaling: tissue architecture regulates development, homeostasis, and cancer.Annu. Rev. Cell Dev. Biol. 2006; 22: 287-309Crossref PubMed Scopus (856) Google Scholar, 8Aszódi A. Legate K.R. Nakchbandi I. Fässler R. What mouse mutants teach us about extracellular matrix function.Annu. Rev. Cell Dev. Biol. 2006; 22: 591-621Crossref PubMed Scopus (139) Google Scholar, 9Bateman J.F. Boot-Handford R.P. Lamandé S.R. Genetic diseases of connective tissues: cellular and extracellular effects of ECM mutations.Nat. Rev. Genet. 2009; 10: 173-183Crossref PubMed Scopus (237) Google Scholar) and it has been emphasized recently that the ECM proteome needs better characterization (10Wilson R. The extracellular matrix: an underexplored but important proteome.Expert Rev. Proteomics. 2010; 7: 803-806Crossref PubMed Scopus (14) Google Scholar). Extracellular matrix Liquid chromatography-Tandem Mass Spectrometry Off-gel Electrophoresis. Extracellular matrix Liquid chromatography-Tandem Mass Spectrometry Off-gel Electrophoresis. The role of the ECM in cancer is of particular interest. Long-standing as well as recent data implicate tumor ECM as a significant contributor to tumor progression. Indeed, the ECM is a major component of the tumor microenvironment (11Cretu A. Brooks P.C. Impact of the non-cellular tumor microenvironment on metastasis: potential therapeutic and imaging opportunities.J. Cell. Physiol. 2007; 213: 391-402Crossref PubMed Scopus (50) Google Scholar, 12van Kempen L.C. Ruiter D.J. van Muijen G.N. Coussens L.M. The tumor microenvironment: a critical determinant of neoplastic evolution.Eur. J. Cell Biol. 2003; 82: 539-548Crossref PubMed Scopus (199) Google Scholar) and classical pathology has shown that excessive deposition of ECM is a common feature of tumors with poor prognosis. More recently, gene expression screens have revealed that many genes encoding ECM components and ECM receptors are dysregulated during tumor progression (13Ramaswamy S. Ross K.N. Lander E.S. Golub T.R. A molecular signature of metastasis in primary solid tumors.Nat. Genet. 2003; 33: 49-54Crossref PubMed Scopus (2020) Google Scholar, 14Wong S.Y. Crowley D. Bronson R.T. Hynes R.O. Analyses of the role of endogenous SPARC in mouse models of prostate and breast cancer.Clin. Exp. Metastasis. 2008; 25: 109-118Crossref PubMed Scopus (39) Google Scholar, 15Eckhardt B.L. Parker B.S. van Laar R.K. Restall C.M. Natoli A.L. Tavaria M.D. Stanley K.L. Sloan E.K. Moseley J.M. Anderson R.L. Genomic analysis of a spontaneous model of breast cancer metastasis to bone reveals a role for the extracellular matrix.Mol. Cancer Res. 2005; 3: 1-13PubMed Google Scholar, 16Xu L. Begum S. Hearn J.D. Hynes R.O. GPR56, an atypical G protein-coupled receptor, binds tissue transglutaminase, TG2, and inhibits melanoma tumor growth and metastasis.Proc. Natl. Acad. Sci. U.S.A. 2006; 103: 9023-9028Crossref PubMed Scopus (218) Google Scholar). Finally, modifications of the extracellular matrix architecture and biophysical properties have been shown to influence tumor progression (6Mott J.D. Werb Z. Regulation of matrix biology by matrix metalloproteinases.Curr. Opin. Cell Biol. 2004; 16: 558-564Crossref PubMed Scopus (872) Google Scholar, 17Erler J.T. Weaver V.M. Three-dimensional context regulation of metastasis.Clin. Exp. Metastasis. 2009; 26: 35-49Crossref PubMed Scopus (240) Google Scholar, 18Cukierman E. Bassi D.E. Physico-mechanical aspects of extracellular matrix influences on tumorigenic behaviors.Semin. Cancer Biol. 2010; 20: 139-145Crossref PubMed Scopus (90) Google Scholar). Despite these clear indications that tumor ECM and the interactions of cells with it are very likely to play important roles in tumor progression, we do not have a good picture of ECM composition, origins and functions in tumors. One reason for this lies in the biochemical properties of ECM proteins (large size, insolubility, cross-linking, etc.) that have rendered very challenging attempts to characterize systematically the composition of the ECM from tissues and tumors. Thanks to the completion of the genomes of many species and to previous studies (19Sodergren E. Weinstock G.M. Davidson E.H. Cameron R.A. Gibbs R.A. Angerer R.C. Angerer L.M. Arnone M.I. Burgess D.R. Burke R.D. Coffman J.A. Dean M. Elphick M.R. Ettensohn C.A. Foltz K.R. Hamdoun A. Hynes R.O. Klein W.H. Marzluff W. McClay D.R. Morris R.L. Mushegian A. Rast J.P. Smith L.C. Thorndyke M.C. Vacquier V.D. Wessel G.M. Wray G. Zhang L. Elsik C.G. Ermolaeva O. Hlavina W. Hofmann G. Kitts P. Landrum M.J. Mackey A.J. Maglott D. Panopoulou G. Poustka A.J. Pruitt K. Sapojnikov V. Song X. Souvorov A. Solovyev V. Wei Z. Whittaker C.A. Worley K. Durbin K.J. Shen Y. Fedrigo O. Garfield D. Haygood R. Primus A. Satija R. Severson T. Gonzalez-Garay M.L. Jackson A.R. Milosavljevic A. Tong M. Killian C.E. Livingston B.T. Wilt F.H. Adams N. Bellé R. Carbonneau S. Cheung R. Cormier P. Cosson B. Croce J. Fernandez-Guerra A. Genevière A.M. Goel M. Kelkar H. Morales J. Mulner-Lorillon O. Robertson A.J. Goldstone J.V. Cole B. Epel D. Gold B. Hahn M.E. Howard-Ashby M. Scally M. Stegeman J.J. Allgood E.L. Cool J. Judkins K.M. McCafferty S.S. Musante A.M. Obar R.A. Rawson A.P. Rossetti B.J. Gibbons I.R. Hoffman M.P. Leone A. Istrail S. Materna S.C. Samanta M.P. Stolc V. Tongprasit W. Tu Q. Bergeron K.F. Brandhorst B.P. Whittle J. Berney K. Bottjer D.J. Calestani C. Peterson K. Chow E. Yuan Q.A. Elhaik E. Graur D. Reese J.T. Bosdet I. Heesun S. Marra M.A. Schein J. Anderson M.K. Brockton V. Buckley K.M. Cohen A.H. Fugmann S.D. Hibino T. Loza-Coll M. Majeske A.J. Messier C. Nair S.V. Pancer Z. Terwilliger D.P. Agca C. Arboleda E. Chen N. Churcher A.M. Hallböök F. Humphrey G.W. Idris M.M. Kiyama T. Liang S. Mellott D. Mu X. Murray G. Olinski R.P. Raible F. Rowe M. Taylor J.S. Tessmar-Raible K. Wang D. Wilson K.H. Yaguchi S. Gaasterland T. Galindo B.E. Gunaratne H.J. Juliano C. Kinukawa M. Moy G.W. Neill A.T. Nomura M. Raisch M. Reade A. Roux M.M. Song J.L. Su Y.H. Townley I.K. Voronina E. Wong J.L. Amore G. Branno M. Brown E.R. Cavalieri V. Duboc V. Duloquin L. Flytzanis C. Gache C. Lapraz F. Lepage T. Locascio A. Martinez P. Matassi G. Matranga V. Range R. Rizzo F. Röttinger E. Beane W. Bradham C. Byrum C. Glenn T. Hussain S. Manning G. Miranda E. Thomason R. Walton K. Wikramanayke A. Wu S.Y. Xu R. Brown C.T. Chen L. Gray R.F. Lee P.Y. Nam J. Oliveri P. Smith J. Muzny D. Bell S. Chacko J. Cree A. Curry S. Davis C. Dinh H. Dugan-Rocha S. Fowler J. Gill R. Hamilton C. Hernandez J. Hines S. Hume J. Jackson L. Jolivet A. Kovar C. Lee S. Lewis L. Miner G. Morgan M. Nazareth L.V. Okwuonu G. Parker D. Pu L.L. Thorn R. Wright R The genome of the sea urchin Strongylocentrotus purpuratus.Science. 2006; 314: 941-952Crossref PubMed Scopus (889) Google Scholar, 20Whittaker C.A. Bergeron K.F. Whittle J. Brandhorst B.P. Burke R.D. Hynes R.O. The echinoderm adhesome.Dev. Biol. 2006; 300: 252-266Crossref PubMed Scopus (127) Google Scholar, 21Hynes R.O. Zhao Q. The evolution of cell adhesion.J. Cell Biol. 2000; 150: F89-F96Crossref PubMed Google Scholar), it is now clear that vertebrate genomes contain hundreds of genes encoding ECM proteins. Specific features of ECM proteins have emerged from these studies, in particular their distinctive structures based on the repetition of conserved domains (22Engel J. Domain organizations of modular extracellular matrix proteins and their evolution.Matrix Biol. 1996; 15: 295-299Crossref PubMed Scopus (31) Google Scholar, 23Hohenester E. Engel J. Domain structure and organisation in extracellular matrix proteins.Matrix Biol. 2002; 21: 115-128Crossref PubMed Scopus (162) Google Scholar). During the last few years, several attempts have been made at in silico predictions of the complement of ECM proteins (24Jung J. Ryu T. Hwang Y. Lee E. Lee D. Prediction of extracellular matrix proteins based on distinctive sequence and domain characteristics.J. Comput. Biol. 2010; 17: 97-105Crossref PubMed Scopus (15) Google Scholar, 25Manabe R. Tsutsui K. Yamada T. Kimura M. Nakano I. Shimono C. Sanzen N. Furutani Y. Fukuda T. Ogur Y. Shimamoto K. Kiyozumi D. Sato Y. Sado Y. Senoo H. Yamashina S. Fukuda S. Kawai J. Sugiura N. Kimata K. Hayashizaki Y. Sekiguchi K. Transcriptome-based systematic identification of extracellular matrix proteins.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 12849-12854Crossref PubMed Scopus (102) Google Scholar, 26Adams J.C. Engel J. Bioinformatic analysis of adhesion proteins.Methods Mol. Biol. 2007; 370: 147-172Crossref PubMed Google Scholar). Furthermore, recent studies have begun to characterize experimentally the composition of the extracellular matrix of specific model systems such as retinal and vascular basement membranes (27Didangelos A. Yin X. Mandal K. Baumert M. Jahangiri M. Mayr M. Proteomics characterization of extracellular space components in the human aorta.Mol. Cell. Proteomics. 2010; 9: 2048-2062Abstract Full Text Full Text PDF PubMed Scopus (211) Google Scholar, 28Balasubramani M. Schreiber E.M. Candiello J. Balasubramani G.K. Kurtz J. Halfter W. Molecular interactions in the retinal basement membrane system: A proteomic approach.Matrix Biol. 2010; 6: 471-483Crossref Scopus (44) Google Scholar, 29Didangelos A. Yin X. Mandal K. Saje A. Smith A. Xu Q. Jahangiri M. Mayr M. Extracellular matrix composition and remodeling in human abdominal aortic aneurysms: a proteomics approach.Mol. Cell. Proteomics. 2011; 10 (M111.008128)Abstract Full Text Full Text PDF PubMed Scopus (146) Google Scholar), mammary gland (30Hattar R. Maller O. McDaniel S. Hansen K.C. Hedman K.J. Lyons T.R. Lucia S. Wilson Jr., R.S. Schedin P. Tamoxifen induces pleiotrophic changes in mammary stroma resulting in extracellular matrix that suppresses transformed phenotypes.Breast Cancer Res. 2009; 11: R5Crossref PubMed Scopus (52) Google Scholar, 31Hansen K.C. Kiemele L. Maller O. O'Brien J. Shankar A. Fornetti J. Schedin P. An in-solution ultrasonication-assisted digestion method for improved extracellular matrix proteome coverage.Mol. Cell. Proteomics. 2009; 8: 1648-1657Abstract Full Text Full Text PDF PubMed Scopus (84) Google Scholar), and cartilage (32Wilson R. Diseberg A.F. Gordon L. Zivkovic S. Tatarczuch L. Mackie E.J. Gorman J.J. Bateman J.F. Comprehensive profiling of cartilage extracellular matrix formation and maturation using sequential extraction and label-free quantitative proteomics.Mol. Cell Proteomics. 2010; 9: 1296-1313Abstract Full Text Full Text PDF PubMed Scopus (63) Google Scholar). However, there remains a pressing need for a better definition of the number and diversity of ECM proteins and even of what should be included in that definition. Limitations arise also from the lack of experimental reagents and approaches because of the biochemical intractability of ECM and the lack of an adequate library of antibodies or other probes to characterize ECM proteins in situ. Thus, deciphering the complexity of the extracellular matrix in vivo represents an important scientific challenge. We describe here the development of proteomics-based methods coupled with a bioinformatic definition of the "matrisome" (ECM and ECM-associated proteins) to analyze the protein composition of the tissue extracellular matrix. We have successfully applied this strategy to characterize in detail the extracellular matrices both of normal murine tissues (lung and colon) and of melanoma tumors (nonmetastatic and metastatic), which each comprise well over 100 proteins. Moreover, we have applied this approach to understand the origins of tumor ECM proteins and have been able to show, using human into mouse xenograft models, that both tumor cells and stromal cells contribute in characteristic ways to the ECM of the tumor microenvironment. Furthermore, we show that both tumor and stromal cells contribute to significant changes in the extracellular matrices of tumors of differing metastatic potential. The strategy we describe and illustrate here can be broadly applied and we provide protocols and inventories of ECM domains and proteins to facilitate application of these methods by others. Normal tissues were from 8- to 12-week-old FVB mice. Lungs were perfused by intracardiac injection with 3 ml of phosphate-buffered saline to remove blood. Colon segments were rinsed with phosphate-buffered saline to remove feces. A375 and MA2 human melanoma cell lines (16Xu L. Begum S. Hearn J.D. Hynes R.O. GPR56, an atypical G protein-coupled receptor, binds tissue transglutaminase, TG2, and inhibits melanoma tumor growth and metastasis.Proc. Natl. Acad. Sci. U.S.A. 2006; 103: 9023-9028Crossref PubMed Scopus (218) Google Scholar) were grown in HyClone high-glucose Dulbecco's modified Eagle's medium (Thermo Scientific) supplemented with 2 mm glutamine and 10% fetal bovine serum (Invitrogen, Carlsbad, CA). Eight-week-old NOD/SCID/IL2Rγ null (Jackson Laboratory, West Grove, PA) male mice were anesthetized using isoflurane (Abbott Laboratories, North Chicago, IL) and 5.105 cells were injected subcutaneously into the left flank of the mouse. Animals were sacrificed 5 weeks post-injection and the tumors were dissected, flash frozen and kept at −80 °C. Sequential extractions of frozen samples of tissues or tumors were performed using the CNMCS (Cytosol/Nucleus/Membrane/Cytoskeleton) Compartmental Protein Extraction kit (Cytomol, Union City, CA) according to manufacturer's instructions. In brief, frozen tissues (150–200 mg) or tumors (200–j400 mg) were homogenized and extracted sequentially to remove (1Hynes R.O. Integrins: bidirectional, allosteric signaling machines.Cell. 2002; 110: 673-687Abstract Full Text Full Text PDF PubMed Scopus (6852) Google Scholar) cytosolic proteins (2van der Flier A. Sonnenberg A. Function and interactions of integrins.Cell Tissue Res. 2001; 305: 285-298Crossref PubMed Scopus (808) Google Scholar), nuclear proteins (3Frantz C. Stewart K.M. Weaver V.M. The extracellular matrix at a glance.J. Cell Sci. 2010; 123: 4195-4200Crossref PubMed Scopus (2358) Google Scholar), membrane proteins (4Schwartz M.A. Integrins and extracellular matrix in mechanotransduction Extracellular Matrix Biology.in: Hynes R.O. Yamada K.M. Cold Spring Harb Perspect. Biol. 201010.1101/cshperspect.a005066Google Scholar), and cytoskeletal proteins leaving a final insoluble fraction enriched for ECM proteins. Fractions were separated on SDS-polyacrylamide gradient gels, transferred to nitrocellulose membranes and probed with antibodies to proteins characteristic of different subcellular compartments (see Fig. 1A and Extended Experimental Procedures). ECM-enriched fractions were solubilized in urea, disulfide bonds reduced and alkylated, and proteins digested with PNGaseF, Lys-C, and trypsin. Solutions that began cloudy upon initial reconstitution were clear after overnight digestion. The resulting peptides were separated by off-gel electrophoresis (OGE) according to isoelectric point and by reversed-phase high-performance liquid chromatography followed by tandem mass spectrometry (MS/MS) on an LTQ Orbitrap mass spectrometer. Mass spectra were interpreted with SpectrumMill and annotated using the matrisome bioinformatics lists developed in this work. MS/MS spectra were searched against a UniProt database containing either mouse only or both mouse (53,448 entries) and human (78,369 entries) sequences; all sequences (including isoforms and excluding fragments) were downloaded from the UniProt web site on June 30, 2010. To each database a set of common laboratory contaminant proteins (73 entries) was appended. Peptides identified with a false discovery rate < 2.5% were assembled into identified proteins, and our in silico matrisome list was then used to categorize all of the identified proteins as being ECM derived or not. MS/MS spectra searches allowed for carbamidomethylation of cysteines and possible carbamylation of N termini as fixed/mix modifications. Allowed variable modifications were oxidized methionine, deamidation of asparagine, pyro-glutamic acid modification at N-terminal glutamine, and hydroxylation of proline with a precursor MH+ shift range of –18 to 97 Da. Hydroxyproline was only observed in the proteins known to have it (collagens and proteins containing collagen domains, emilins, etc.) and only within the expected GXPG sequence motifs. supplemental Tables S7 and S8 containing the detailed peptide spectral matches might have some examples not in the expected motif when there is either a proline near the motif for which the spectrum could have had insufficient fragmentation to confidently localize the mass change to a particular residue, or a nearby methionine in the peptide and the spectrum had insufficient fragmentation to localize the mass change to oxidized Met or hydroxyproline. When the motif nX[ST] occurs in a peptide in supplemental Tables S7 and S8, this is likely to indicate a site where N-linked glycosylation was removed by the PNGaseF treatment of the sample. Although a lowercase n indicates a gene-encoded asparagine residue detected in aspartic acid from, possible mechanisms of modification such as acid-catalyzed deamidation during sample processing versus enzymatic conversion during deglycosylation cannot be explicitly distinguished. Our automated database searching based interpretation of the MS/MS spectra did not attempt to detect any of the many known examples of crosslinking previously observed in collagen family proteins (33Eyre D.R. The collagens of articular cartilage.Semin. Arthritis Rheum. 1991; 21: 2-11Crossref PubMed Scopus (112) Google Scholar, 34Robins S.P. Biochemistry and functional significance of collagen cross-linking.Biochem. Soc. Trans. 2007; 35: 849-852Crossref PubMed Scopus (142) Google Scholar) nor did our sample processing methods attempt to enrich for or deplete crosslinked peptides from the samples. Consequently, the spectra generated in this study may be a valuable resource to mine for sites and forms of collagen crosslinking. Additional detailed information can be found in the Extended Experimental Procedures. The raw LC-MS/MS data associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash: onoKvMCC9umP0dW2GlZSGN/DVfz6tqoyRvsA16h351RjGJZiqjk0wWGYQB9+zPpRj+ASAgKsd779N0Td4250FCPJ6jUAAAAAAABFoQ==. The human and mouse proteomes were each screened for proteins containing domains characteristic of ECM proteins, ECM-affiliated proteins, ECM modifiers and secreted factors. Those lists were subsequently screened to eliminate proteins that shared one or more of the defining domains but were not ECM or ECM-associated proteins based on other criteria. Detailed information can be found in the Extended Experimental Procedures. We have also deployed a webpage providing collection of resources (data files, sequence files) and further annotations on the bioinformatic pipeline developed for this study http://web.mit.edu/hyneslab/matrisome/. Tumor samples were formalin-fixed and paraffin-embedded. Sections were dewaxed and rehydrated following standard procedures. Antigen retrieval was performed by incubating sections in boiling 10 mm sodium citrate buffer (pH6.0) for 20 min. Sections were then blocked with PBS containing 4% ovalbumin. Incubation with rabbit anti-HAPLN1 antibody (Sigma) was performed overnight at 4°C and secondary antibody incubation, 2 h at room temperature. Secondary goat-anti-rabbit antibody conjugated with Alexa-568 was from Invitrogen. Sections were counterstained with DAPI (4′,6-diamidino-2-phenylindole) to visualize nuclei. Analysis of the protein composition of the extracellular matrix presents challenges due to the diversity, large size, insolubility and cross-linking of these proteins. By contrast, most other cellular components are soluble even at relatively low concentrations of salt or detergents. Therefore, we took advantage of the insolubility of ECM proteins to enrich for them while depleting other cellular components. We used a subcellular fractionation protocol to extract sequentially components from the cytosol, the nucleus, the membrane and the cytoskeleton and enrich for ECM proteins (see Extended Experimental Procedures). Fig. 1A shows the sequential extraction of proteins from the different cellular compartments, using diagnostic marker proteins for each compartment. ECM proteins such as fibronectin (as well as laminins and collagens, not shown) were not extracted during these intermediate steps and were found to be enriched in the final insoluble fraction. To analyze the composition of the ECM-enriched fractions obtained after depletion of other cellular components, we digested the proteins to peptides and employed a proteomics pipeline shown in Fig. 1B using liquid chromatography combined with tandem mass spectrometry (LC-MS/MS) to identify peptides and proteins (see Extended Experimental Procedures). Analysis by LC-MS/MS of ECM proteins enriched from murine lung and digested to peptides confirmed a significant enrichment for matrix proteins, with more than 75% of the total precursor ion intensity (the sum of MS1 precursor ion peak areas for all identified peptides) corresponding to proteins defined as ECM (Fig. 2A, left panel). To help measure the success of our enrichment strategy and focus downstream biological follow-up we sought to categorize the identified proteins as being ECM-derived or not. The categorization of each protein identified by mass spectrometry was initially performed using the Gene Ontology (GO) "Cellular Compartment" annotations. However, this annotation showed several clear limitations. For example, several cytosolic or cytoskeletal proteins involved in cell-matrix adhesion are mis-annotated as being part of the extracellular matrix (see supplemental Table S1); conversely, some known ECM proteins (thrombospondin 1, von Willebrand factor, agrin, etc.) are defined by vague terms such as "external side of the plasma membrane" or "cell surface" (supplemental Table S1). In addition more than 20 different GO categories correspond to the extracellular matrix (extracellular matrix, basal lamina, basement membrane, etc.) and yet UniProt identifiers for some known ECM proteins are not associated with any cellular compartments in the Gene Ontology database. Finally, and of importance to the study of human/mouse xenografts (see below), we noted conflicting annotations between human and mouse proteins. In order to interpret the mass spectrometric data we needed a better definition of which proteins should be considered as part of the ECM. Therefore, we developed a bioinformatic approach to predict within any genome the ensemble of genes encoding what we define as the "matrisome," namely all those components constituting the extracellular matrix (the "core matrisome") and those components associated with it ("matrisome-associated" proteins). One hallmark of ECM proteins is their domain-based structure (23Hohenester E. Engel J. Domain structure and organisation in extracellular matrix proteins.Matrix Biol. 2002; 21: 115-128Crossref PubMed Scopus (162) Google Scholar). Exploiting this characteristic, we established a list of 55 diagnostic InterPro domains commonly found in ECM proteins (type I, II and III fibronectin domains, type I thrombospondin repeats, laminin G domain, etc.; Fig. 3A and supplemental Table S2A). This domain list was used to screen the UniProt protein database. We know that some of the domains used to select positively for ECM proteins are also found in transmembrane receptors and proteins involved in cell adhesion (growth factor receptors, integrins, etc) that do not belong to the ECM. These families of proteins also display a subset of specific domains (e.g. tyrosine kinase and phosphatase domains) and transmembrane domains incompatible with definition as "extracellular matrix" proteins. Therefore, a second step comprised a negative selection using 20 domains (supplemental Table S2B) and a transmembrane domain prediction (see Extended Experimental Procedures for details). This analysis was performed in parallel for both the mouse and human genomes and the respective murine and human matrisome lists were compared based on orthology. Manual curation of the matrisome lists also allowed us to add a very few known ECM proteins that do not contain any known domains; for example, dermatopontin and dentin sialophosphoprotein (supplemental Table S3). Finally, knowledge-based annotation of these gene lists allowed us t