Python(编程语言)
计算机科学
全基因组关联研究
数据映射
元数据
生命银行
数据挖掘
数据库
生物信息学
万维网
生物
操作系统
遗传学
基因
基因型
单核苷酸多态性
作者
Ben Elsworth,Matthew Lyon,Tessa Alexander,Yi Liu,Peter Matthews,Jon Hallett,P. J. Bates,Tom Palmer,Valeriia Haberland,George Davey Smith,Jie Zheng,Philip Haycock,Tom R. Gaunt,Gibran Hemani
标识
DOI:10.1101/2020.08.10.244293
摘要
Abstract Data generated by genome-wide association studies (GWAS) are growing fast with the linkage of biobank samples to health records, and expanding capture of high-dimensional molecular phenotypes. However the utility of these efforts can only be fully realised if their complete results are collected from their heterogeneous sources and formats, harmonised and made programmatically accessible. Here we present the OpenGWAS database, an open source, open access, scalable and high-performance cloud-based data infrastructure that imports and publishes complete GWAS summary datasets and metadata for the scientific community. Our import pipeline harmonises these datasets against dbSNP and the human genome reference sequence, generates summary reports and standardises the format of results and metadata. Users can access the data via a website, an application programming interface, R and Python packages, and also as downloadable files that can be rapidly queried in high performance computing environments. OpenGWAS currently contains 126 billion genetic associations from 14,582 complete GWAS datasets representing a range of different human phenotypes and disease outcomes across different populations. We developed R and Python packages to serve as conduits between these GWAS data sources and a range of available analytical tools, enabling Mendelian randomization, genetic colocalisation analysis, fine mapping, genetic correlation and locus visualisation. OpenGWAS is freely accessible at https://gwas.mrcieu.ac.uk , and has been designed to facilitate integration with third party analytical tools.
科研通智能强力驱动
Strongly Powered by AbleSci AI