SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

生物 转录组 计算生物学 管道(软件) 鉴定(生物学) 蛋白质基因组学 开放式参考框架 计算机科学 遗传学 基因 打开阅读框 基因表达 植物 程序设计语言 肽序列
作者
Manuel Tardáguila,Lorena de la Fuente,Cristina Martí,Cécile Pereira,Francisco Pardo-Palacios,Héctor del Risco,Marc Ferrell,Maravillas Mellado-López,Marissa Macchietto,Kenneth Verheggen,Mariola J. Edelmann,Iakes Ezkurdia,Jesús Vázquez,Michael L. Tress,A Mortazavi,Lennart Martens,Susana Rodríguez‐Navarro,Victoria Moreno‐Manzano,Ana Conesa
出处
期刊:Genome Research [Cold Spring Harbor Laboratory Press]
卷期号:28 (3): 396-411 被引量:287
标识
DOI:10.1101/gr.222976.117
摘要

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
myangm完成签到,获得积分10
1秒前
77完成签到,获得积分10
2秒前
DZS完成签到 ,获得积分10
3秒前
干净的沛蓝完成签到,获得积分10
3秒前
大个应助浅丿颜采纳,获得10
3秒前
江渡完成签到,获得积分20
4秒前
5秒前
5秒前
cherry完成签到,获得积分10
5秒前
wulanshu应助唠叨的中道采纳,获得10
6秒前
争气完成签到,获得积分10
7秒前
科研菜鸟发布了新的文献求助10
9秒前
22nd完成签到,获得积分10
9秒前
慈善家完成签到,获得积分10
9秒前
9秒前
王wang完成签到,获得积分10
10秒前
10秒前
10秒前
12秒前
自由的尔蓉完成签到 ,获得积分10
12秒前
花样年华完成签到,获得积分10
12秒前
北风语完成签到,获得积分10
14秒前
dde应助飞飞采纳,获得10
14秒前
14秒前
黄兆强发布了新的文献求助10
14秒前
冷傲向真完成签到,获得积分20
14秒前
谢尔顿完成签到,获得积分10
15秒前
SHUI发布了新的文献求助10
15秒前
香蕉梨愁完成签到,获得积分10
16秒前
17秒前
脑洞疼应助biofresh采纳,获得10
18秒前
科研鱼完成签到 ,获得积分10
19秒前
singsong完成签到,获得积分10
19秒前
20秒前
小菜发布了新的文献求助10
20秒前
20秒前
成就的迎夏完成签到,获得积分10
20秒前
zxa完成签到,获得积分10
22秒前
谦让芷蝶完成签到,获得积分10
23秒前
土豆小狗勇敢飞完成签到 ,获得积分10
23秒前
高分求助中
Psychopathic Traits and Quality of Prison Life 1000
Chemistry and Physics of Carbon Volume 18 800
The formation of Australian attitudes towards China, 1918-1941 660
Signals, Systems, and Signal Processing 610
天津市智库成果选编 600
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
全相对论原子结构与含时波包动力学的理论研究--清华大学 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6451760
求助须知:如何正确求助?哪些是违规求助? 8263479
关于积分的说明 17608492
捐赠科研通 5516392
什么是DOI,文献DOI怎么找? 2903725
邀请新用户注册赠送积分活动 1880669
关于科研通互助平台的介绍 1722664