顺序装配
转录组
计算机科学
计算生物学
RNA序列
集合(抽象数据类型)
序列(生物学)
软件
基因
生物
从头转录组组装
遗传学
基因表达
程序设计语言
作者
Mihaela Pertea,Geo Pertea,Corina Antonescu,Tsung-Cheng Chang,Joshua T. Mendell,Steven L. Salzberg
摘要
Using a network flow algorithm from optimization theory enables improved assembly of transcriptomes from RNA-seq reads. Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.
科研通智能强力驱动
Strongly Powered by AbleSci AI