生物
RNA剪接
计算生物学
转录组
选择性拼接
剪接
内含子
遗传学
RNA序列
基因
基因亚型
核糖核酸
基因表达
作者
Nils Wagner,Muhammed Hasan Çelik,Florian R. Hölzlwimmer,Christian Mertes,Holger Prokisch,Vicente A. Yépez,Julien Gagneur
出处
期刊:Nature Genetics
[Springer Nature]
日期:2023-05-01
卷期号:55 (5): 861-870
被引量:38
标识
DOI:10.1038/s41588-023-01373-3
摘要
Aberrant splicing is a major cause of genetic disorders but its direct detection in transcriptomes is limited to clinically accessible tissues such as skin or body fluids. While DNA-based machine learning models can prioritize rare variants for affecting splicing, their performance in predicting tissue-specific aberrant splicing remains unassessed. Here we generated an aberrant splicing benchmark dataset, spanning over 8.8 million rare variants in 49 human tissues from the Genotype-Tissue Expression (GTEx) dataset. At 20% recall, state-of-the-art DNA-based models achieve maximum 12% precision. By mapping and quantifying tissue-specific splice site usage transcriptome-wide and modeling isoform competition, we increased precision by threefold at the same recall. Integrating RNA-sequencing data of clinically accessible tissues into our model, AbSplice, brought precision to 60%. These results, replicated in two independent cohorts, substantially contribute to noncoding loss-of-function variant identification and to genetic diagnostics design and analytics.
科研通智能强力驱动
Strongly Powered by AbleSci AI