生命银行
移相器
1000基因组计划
单倍型
基因组
管道(软件)
全基因组测序
遗传学
染色体
计算生物学
计算机科学
生物
数据挖掘
单核苷酸多态性
基因型
工程类
基因
电气工程
程序设计语言
作者
Brian L. Browning,Sharon R. Browning
标识
DOI:10.1101/2022.10.03.510691
摘要
Abstract The first release of UK Biobank whole genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and tabix for VCF indexing. We used the pipeline to phase 406 million single nucleotide variants on chromosomes 1-22 and X at a cost of 2,309 British pounds. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.
科研通智能强力驱动
Strongly Powered by AbleSci AI