生物
计算生物学
遗传学
基因
错义突变
梯度升压
基因组
疾病
优先次序
人类遗传学
机器学习
突变
计算机科学
随机森林
医学
病理
经济
管理科学
作者
Fengxiao Bu,Mingjun Zhong,Qinyi Chen,Yumei Wang,Xia Zhao,Qian Zhang,Xiarong Li,Kevin T. Booth,Héla Azaiez,Yu Lu,Jing Cheng,Richard J. Smith,Huijun Yuan
出处
期刊:Human Genetics
[Springer Nature]
日期:2022-02-19
卷期号:141 (3-4): 401-411
被引量:7
标识
DOI:10.1007/s00439-022-02440-1
摘要
Numerous computational prediction tools have been introduced to estimate the functional impact of variants in the human genome based on evolutionary constraints and biochemical metrics. However, their implementation in diagnostic settings to classify variants faced challenges with accuracy and validity. Most existing tools are pan-genome and pan-diseases, which neglected gene- and disease-specific properties and limited the accessibility of curated data. As a proof-of-concept, we developed a disease-specific prediction tool named Deafness Variant deleteriousness Prediction tool (DVPred) that focused on the 157 genes reportedly causing genetic hearing loss (HL). DVPred applied the gradient boosting decision tree (GBDT) algorithm to the dataset consisting of expert-curated pathogenic and benign variants from a large in-house HL patient cohort and public databases. With the incorporation of variant-level and gene-level features, DVPred outperformed the existing universal tools. It boasts an area under the curve (AUC) of 0.98, and showed consistent performance (AUC = 0.985) in an independent assessment dataset. We further demonstrated that multiple gene-level metrics, including low complexity genomic regions and substitution intolerance scores, were the top features of the model. A comprehensive analysis of missense variants showed a gene-specific ratio of predicted deleterious and neutral variants, implying varied tolerance or intolerance to variation in different genes. DVPred explored the utility of disease-specific strategy in improving the deafness variant prediction tool. It can improve the prioritization of pathogenic variants among massive variants identified by high-throughput sequencing on HL genes. It also shed light on the development of variant prediction tools for other genetic disorders.
科研通智能强力驱动
Strongly Powered by AbleSci AI