转录组
生物
管道(软件)
计算生物学
剪接
软件
RNA序列
选择性拼接
从头转录组组装
深度测序
计算机科学
顺序装配
数据挖掘
生物信息学
遗传学
基因组
基因
基因亚型
基因表达
程序设计语言
作者
Seong Woo Han,San Jewell,Andrei Thomas‐Tikhonenko,Yoseph Barash
出处
期刊:Genome Research
[Cold Spring Harbor Laboratory]
日期:2024-09-25
卷期号:: gr.278659.123-gr.278659.123
标识
DOI:10.1101/gr.278659.123
摘要
Mapping transcriptomic variations using either short- or long-reads RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, while short reads still provide improved coverage and error rates. Yet how to quantitatively compare the technologies, can we combine those, and what may be the benefit of such a combined view remain open questions. We tackle these questions by first creating a pipeline to assess matched long and short reads data using a variety of transcriptome statistics. We find that across datasets, algorithms, and technologies, matched short reads data detects roughly 30% more splice junctions such that 10-30% of the splice junctions included at 20% or more by short reads are missed by long reads. In contrast, long reads detect many more intron retention events and can detect full isoforms, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long-read technology or algorithm, and combine it with short reads data for improved transcriptome analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI