计算生物学
地中海贫血
DNA测序
遗传学
分子诊断学
生物
基因
生物信息学
作者
Yujie Cao,Shau Yin HA,Chi-Chiu So,Tong Ming For,Clara Sze-Man Tang,Huoru Zhang,Rui Liang,Jing Yang,Brian Hon-Yin Chung,Godfrey Chi-Fung Chan,Yu-Lung Lau,Maria-Mercè Garcia-Barceló,Edmond Shiu-Kwan,Pranee Sucharitchan,Nattiya Hirankarn,Wanling Yang
标识
DOI:10.1016/j.jmoldx.2022.06.006
摘要
Thalassemia is one of the most common genetic diseases and a major health threat worldwide. Accurate, efficient, and scalable analysis of next-generation sequencing (NGS) data is much needed for its molecular diagnosis and carrier screening. We developed NGS4THAL, a bioinformatics analysis pipeline analyzing NGS data to detect pathogenic variants for thalassemia and other hemoglobinopathies. NGS4THAL realigns ambiguously mapped NGS reads derived from the homologous Hb gene clusters for accurate detection of point mutations and small insertions/deletions. It uses a combination of complementary structural variant (SV) detection tools and an in-house database of control data containing specific SVs to achieve accurate detection of the complex SV types. Detected variants are matched with those in HbVar (A Database of Human Hemoglobin Variants and Thalassemia Mutations), allowing recognition of known pathogenic variants, including disease modifiers. Tested on simulation data, NGS4THAL achieved high sensitivity and specificity. For targeted NGS sequencing data from samples with laboratory-confirmed pathogenic Hb variants, it achieved 100% detection accuracy. Application of NGS4THAL on whole genome sequencing data from unrelated studies revealed thalassemia mutation carrier rates for Hong Kong Chinese and Northern Vietnamese that were consistent with previous reports. NGS4THAL is a highly accurate and efficient molecular diagnosis tool for thalassemia and other hemoglobinopathies based on tailored analysis of NGS data and may be scaled for population carrier screening. Thalassemia is one of the most common genetic diseases and a major health threat worldwide. Accurate, efficient, and scalable analysis of next-generation sequencing (NGS) data is much needed for its molecular diagnosis and carrier screening. We developed NGS4THAL, a bioinformatics analysis pipeline analyzing NGS data to detect pathogenic variants for thalassemia and other hemoglobinopathies. NGS4THAL realigns ambiguously mapped NGS reads derived from the homologous Hb gene clusters for accurate detection of point mutations and small insertions/deletions. It uses a combination of complementary structural variant (SV) detection tools and an in-house database of control data containing specific SVs to achieve accurate detection of the complex SV types. Detected variants are matched with those in HbVar (A Database of Human Hemoglobin Variants and Thalassemia Mutations), allowing recognition of known pathogenic variants, including disease modifiers. Tested on simulation data, NGS4THAL achieved high sensitivity and specificity. For targeted NGS sequencing data from samples with laboratory-confirmed pathogenic Hb variants, it achieved 100% detection accuracy. Application of NGS4THAL on whole genome sequencing data from unrelated studies revealed thalassemia mutation carrier rates for Hong Kong Chinese and Northern Vietnamese that were consistent with previous reports. NGS4THAL is a highly accurate and efficient molecular diagnosis tool for thalassemia and other hemoglobinopathies based on tailored analysis of NGS data and may be scaled for population carrier screening.
科研通智能强力驱动
Strongly Powered by AbleSci AI