RNA剪接
变压器
摄动(天文学)
计算机科学
核糖核酸
物理
工程类
生物
电气工程
遗传学
量子力学
基因
电压
作者
Colin P McNally,Nour J. Abdulhay,Mona Khalaj,Alihossein Saberi,Balyn W. Zaro,Hani Goodarzi,Vijay Ramani
标识
DOI:10.1101/2024.03.20.585793
摘要
ABSTRACT Predicting molecular function directly from DNA sequence remains a grand challenge in computational and molecular biology. Here, we engineer and train bidirectional transformer models to predict the chemical grammar of alternative human mRNA splicing leveraging the largest perturbative full-length RNA dataset to date. By combining high-throughput single-molecule long-read “chemical transcriptomics” in human cells with transformer models, we train AllSplice – a nucleotide foundation model that achieves state-of-the-art prediction of canonical and noncanonical splice junctions across the human transcriptome. We demonstrate improved performance achieved through incorporation of diverse noncanonical splice sites in its training set that were identified through long-read RNA data. Leveraging chemical perturbations and multiple cell types in the data, we fine-tune AllSplice to train ChemSplice – the first predictive model of sequence-dependent and cell-type specific alternative splicing following programmed cellular perturbation. We anticipate the broad application of AllSplice, ChemSplice, and other models fine-tuned on this foundation to myriad areas of RNA therapeutics development.
科研通智能强力驱动
Strongly Powered by AbleSci AI