作者
Michael C. Schatz,Anthony Philippakis,Enis Afgan,Eric Banks,Vincent J. Carey,Robert J. Carroll,Alessandro Culotti,Kyle Ellrott,Jeremy Goecks,Robert L. Grossman,Ira M. Hall,Kasper D. Hansen,Jonathan Lawson,Jeffrey T. Leek,Anne O’Donnell‐Luria,Stephen Mosher,Martin Morgan,Anton Nekrutenko,Brian D. O’Connor,Kevin Osborn,Benedict Paten,Candace Patterson,Frederick J. Tan,Casey Overby Taylor,Jennifer Vessio,Levi Waldron,Ting Wang,Kristin Wuichet,AnVIL Team
摘要
Abstract The traditional model of genomic data analysis - downloading data from centralized warehouses for analysis with local computing resources - is increasingly unsustainable. Not only are transfers slow and cost prohibitive, but this approach also leads to redundant and siloed compute infrastructure that makes it difficult to ensure security and compliance of protected data. The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org ) inverts this model, providing a unified cloud computing environment for data storage, management, and analysis. AnVIL eliminates the need for data movement, allows for active threat detection and monitoring, and provides scalable, shared computing resources that can be acquired by researchers as needed. This presents many new opportunities for collaboration and data sharing that will ultimately lead to scientific discoveries at scales not previously possible.