生物
环状RNA
计算生物学
RNA剪接
剪接
计算机科学
核糖核酸
图形
梯田(农业)
转录组
算法
基因
遗传学
理论计算机科学
基因表达
历史
考古
作者
Tasfia Zahin,Qian Shi,Xiaonan Zang,Mingfu Shao
出处
期刊:Genome Research
[Cold Spring Harbor Laboratory]
日期:2024-07-26
卷期号:: gr.279106.124-gr.279106.124
标识
DOI:10.1101/gr.279106.124
摘要
Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with its 5' and 3' ends covalently bonded. circRNAs are known to be more stable than linear RNAs, admit distinct properties and functions, and have been proven to be promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that "bridge" the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown superior to using abundance for scoring. On both simulations and biological datasets TERRACE consistently outperforms existing methods by a large margin in sensitivity while maintaining better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%-413% more correct circRNAs than state-of-the-art methods. TERRACE presents a major leap on assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in the downstream research on circRNAs.
科研通智能强力驱动
Strongly Powered by AbleSci AI