作者
Ratna R. Thangudu,Paul Rudnick,Michael Holck,Deepak Singhal,Michael J. MacCoss,Nathan Edwards,Karen A. Ketchum,Christopher R. Kinsinger,Erika Kim,Anand Basu
摘要
Abstract The objective of the National Cancer Institutes' Proteomic Data Commons (PDC) is to make cancer-related proteomic datasets accessible to the public. The PDC provides the cancer research community with a unified data repository that enables data sharing across cancer proteomic studies and also enables multi-omic integration in support of precision medicine. As a domain-specific repository within the Cancer Research Data Commons (CRDC), the vision for the PDC is to provide researchers the ability to find and analyze proteomic data across a wide variety of tumor types. Currently, the PDC houses data, supported by a large collection of metadata attributes, for nearly 40 datasets from over 12 cancer types produced by several large-scale cancer research programs, each with cohort sizes greater than 100 patients. The PDC facilitates the analysis of proteomic, genomic, and imaging data derived from the same tumor. Most of the datasets in the PDC also have corresponding genomic and imaging data available in the Genomic Data Commons and The Cancer Imaging Archive respectively. Researchers can discover which genomic variants are detectable at the protein-level or better understand associations between gene expression, copy number variation, and protein abundance. The resource is currently available to the public in beta phase (https://pdc.esacinc.com) and will be officially launched on the cancer.gov domain in March 2020. The PDC data portal is supported by a robust and extensible data model and provides user-friendly exploration, visualization and data analysis. This allows researchers to search for and visualize expression of proteins (through their mapped genes) across all studies, analyze protein abundance for all cases in a study through heatmaps, build and explore pan-cancer cohorts using highly curated, clinical metadata, and comprehensively view a study without needing to download the data. The PDC provides quick access to mapping of peptide identities and quantities on the human genome as well as protein databases containing patient/tumor-specific variants and novel splicing events. It also enables fast, accurate, and convenient proteomic validation of novel genomic alterations through the PepQuery algorithm. Through a highly versatile application programming interface (API), PDC allows users to interact with data programmatically and facilitates integration with data from other resources in their scripts for multi-omic analysis. Big data interoperability is critical for progress in precision medicine. PDC is designed to interoperate with other resources including the CRDC nodes, allowing users to analyze PDC data with the tools and pipelines available on the NCI cloud resources. It further allows users to use their own tools to co-analyze genomic and proteomic data available from a common sample on Amazon Web Services (AWS) platform or on a local system. The presentation will provide an overview of the PDC and it's available datasets, as well as a discussion of how it facilitates multi-omic data analyses. Citation Format: Ratna Rajesh Thangudu, Paul A. Rudnick, Michael Holck, Deepak Singhal, Michael J. MacCoss, Nathan J. Edwards, Karen A. Ketchum, Christopher R. Kinsinger, Erika Kim, Anand Basu. Proteomic Data Commons: A resource for proteogenomic analysis [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr LB-242.