清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Beyond Read-Counts: Ribo-seq Data Analysis to Understand the Functions of the Transcriptome

生物 核糖体分析 计算生物学 RNA序列 转录组 剧目 补语(音乐) 核糖体 集合(抽象数据类型) 翻译(生物学) 基因 遗传学 计算机科学 生物信息学 核糖核酸 基因表达 信使核糖核酸 声学 物理 表型 互补 程序设计语言
作者
Lorenzo Calviello,Uwe Ohler
出处
期刊:Trends in Genetics [Elsevier]
卷期号:33 (10): 728-744 被引量:109
标识
DOI:10.1016/j.tig.2017.08.003
摘要

By mapping the positions of millions of translating ribosomes in the cell, ribosome profiling (Ribo-seq) has established its role as a powerful tool to study gene expression. Several laboratories have introduced modifications to the experimental protocol and expanded the repertoire of biochemical methods to study translation transcriptome-wide. However, the diversity of protocols highlights a need for standardization. At the same time, different computational analysis strategies have used Ribo-seq data to identify the set of translated sequences with high confidence. In this review we present an overview of such methodologies, outlining their assumptions, data requirements, and availability. At the interface between RNA and proteins, Ribo-seq can complement data from multiple omics approaches, zooming in on the central role of translation in the molecular cell. By mapping the positions of millions of translating ribosomes in the cell, ribosome profiling (Ribo-seq) has established its role as a powerful tool to study gene expression. Several laboratories have introduced modifications to the experimental protocol and expanded the repertoire of biochemical methods to study translation transcriptome-wide. However, the diversity of protocols highlights a need for standardization. At the same time, different computational analysis strategies have used Ribo-seq data to identify the set of translated sequences with high confidence. In this review we present an overview of such methodologies, outlining their assumptions, data requirements, and availability. At the interface between RNA and proteins, Ribo-seq can complement data from multiple omics approaches, zooming in on the central role of translation in the molecular cell. Ribo-seq has become an established protocol to identify translated transcript regions via deep sequencing, closing the gap between RNA sequencing and proteomics. Recently developed Ribo-seq data analysis strategies use different features as hallmarks of translation. Specifically, the ability to monitor the positions of translating ribosomes with single-nucleotide precision has driven the development of computational tools that rely on ‘subcodon resolution’. Knowing the concrete assumptions and precise goals of different approaches is crucial. In addition to addressing translation-focused questions, from defining open reading frames to identifying alternative translation initiation sites and estimating differential translation rates, Ribo-seq data show great promise for integrative efforts combining additional omics approaches. Ribo-seq has become an established protocol to identify translated transcript regions via deep sequencing, closing the gap between RNA sequencing and proteomics. Recently developed Ribo-seq data analysis strategies use different features as hallmarks of translation. Specifically, the ability to monitor the positions of translating ribosomes with single-nucleotide precision has driven the development of computational tools that rely on ‘subcodon resolution’. Knowing the concrete assumptions and precise goals of different approaches is crucial. In addition to addressing translation-focused questions, from defining open reading frames to identifying alternative translation initiation sites and estimating differential translation rates, Ribo-seq data show great promise for integrative efforts combining additional omics approaches. a machine-learning approach whose objective is to assign datapoints to different classes (two in the case of binary classifiers). In supervised learning, the classifier is trained on known examples, while unsupervised classification methods are used in absence of known (or labeled) data. a sequence that is translated using one (or more) of the three possible reading frames. a probabilistic method in which a signal (e.g., a coverage track or a nucleotide sequence) is emitted from a finite succession of unknown (hidden) states. The hidden states can represent different biological concepts (e.g., 5′-UTRs, ORFs, etc. in genomic sequence classification); transitions between them specify possible sequences of the states, and can be defined and trained on available data (e.g., read coverage or nucleotide sequences in annotated genomic regions). Once the model is trained, it can be used to parse a new signal and label it with the optimal sequence of states. long transcripts (>200 nt) which do not exhibit clear coding potential. a signal processing method that aims to provide reliable estimates of the spectrum of frequencies present in a signal. In the multitaper method, multiple filters are applied as windows over the same signal, and coefficients for all frequency components are retrieved from each filtered sample (using the Fourier transform). Different types of filters have been proposed; specifically, the use of the so-called Slepian sequences enables the application of a statistical test to each frequency component. a modified version of the ordinary least squares, in which the regression coefficients cannot be negative values. an mRNA surveillance pathway that degrades aberrant transcripts, thus preventing the production of non-functional proteins. One of the proposed mechanisms for NMD involves the recognition of a premature termination codon (PTC), aided by the action of proteins that are part of the exon junction complex (EJC). a section of a transcript which contains a start and a stop codon in frame. In eukaryotes, most mRNA transcripts contain one main ORF that is translated into a polypeptide. a technique that isolates nascent protein chains. Ribosome–nascent chain complexes are first isolated, and biotinylated puromycin is incorporated into the complexes. Streptavidin pulldown allows the nascent protein chains to be extracted, and these can by analyzed by LC-MS/MS. proteomics techniques aimed at quantifying protein expression. Label-free quantification methods can be used, but techniques such as SILAC that label amino acids can represent superior alternatives for protein quantification. a classification algorithm that combines the classification output of multiple classifiers, called decision trees. Each tree splits the data into different groups (‘leaves’) and assigns a label to each datapoint in each leaf. Each tree is applied to a subset of the data and features to avoid overfitting. Usually used as a supervised learning method, random forests can also be used for unsupervised learning and for regression tasks. this aims to quantify the relationship between a target variable and one (or more) features. To this end, approaches fit a function that minimizes the distance between the predictor and the target variable (e.g., by using the least squares method). The regression coefficient quantifies the relationship between the target variable and the predictor. a set of techniques that enable the identification and quantification of protein expression from a mixture of digested peptides, using peptide isolation (usually with liquid chromatography, LC) and tandem mass spectrometry (MS/MS). When they are eluted in the LC step, peptides are ionized, and ions are selected in the first MS step according to their mass-to-charge (m/z) ratio. Ions are then fragmented, and in the second MS step fragment ions are again isolated according their m/z ratio and quantified. Using a reference protein database, m/z values can be mapped to expected values matching peptides from known proteins. a measure of correlation between two frequency spectra. Signals exhibiting a similar set of frequency components will have high coherence. pSILAC is a variant of SILAC in which labeled amino acids are added to the cell culture for short periods of time, thus allowing the kinetics of de novo protein synthesis to be monitored. a binary classification algorithm. SVMs are supervised learning methods and therefore need to be trained on known examples. In the training stage, SVMs aim to define a separating line maximizing the distance between the two sets of data. When a linear separation of the two sets is not effective, SVMs can compute the distance between datapoints in a higher-dimensional space by means of different kernel functions in which a linear separation between the samples is possible. This strategy (the ‘kernel trick’) enables non-linear classification, and has contributed to the popularity of SVMs in the machine-learning community. the section of a coding mature mRNA that does not code for protein. The 5′-UTR is located upstream of the start codon, while the 3′-UTR is downstream of the stop codon. a small (usually <100 aa) ORF whose start codon is located in the 5′-UTR upstream of the main ORF of a transcript. Many uORFs have been shown to regulate the translation of the main ORF. It is generally assumed that uORFs do not encode stable polypeptides.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
herpes完成签到 ,获得积分0
22秒前
77完成签到 ,获得积分10
26秒前
solo完成签到,获得积分10
26秒前
大水完成签到 ,获得积分10
33秒前
biancaliu完成签到,获得积分10
35秒前
科目三应助solo采纳,获得10
51秒前
雪妮完成签到 ,获得积分10
56秒前
1分钟前
小贾爱喝冰美式完成签到 ,获得积分10
1分钟前
lielizabeth完成签到 ,获得积分0
1分钟前
Biom完成签到 ,获得积分10
1分钟前
郜南烟发布了新的文献求助10
1分钟前
lili完成签到 ,获得积分10
1分钟前
李健鹏完成签到 ,获得积分10
1分钟前
顾矜应助郜南烟采纳,获得10
1分钟前
火星上惜天完成签到 ,获得积分10
1分钟前
轩辕远航完成签到 ,获得积分10
1分钟前
缥缈映安完成签到 ,获得积分20
2分钟前
安静的ky完成签到 ,获得积分10
2分钟前
loga80完成签到,获得积分0
3分钟前
明朗完成签到 ,获得积分10
3分钟前
郑洲完成签到 ,获得积分10
3分钟前
huiluowork完成签到 ,获得积分10
4分钟前
zhdjj完成签到 ,获得积分10
4分钟前
冰留完成签到 ,获得积分10
4分钟前
小白完成签到 ,获得积分10
4分钟前
狮子发布了新的文献求助30
4分钟前
葡萄炖雪梨完成签到 ,获得积分10
5分钟前
5分钟前
郑先生完成签到 ,获得积分10
5分钟前
小龙发布了新的文献求助10
5分钟前
谭凯文完成签到 ,获得积分10
5分钟前
孟寐以求完成签到 ,获得积分10
5分钟前
小龙完成签到,获得积分10
5分钟前
狮子完成签到,获得积分10
5分钟前
清萍红檀完成签到,获得积分10
5分钟前
段誉完成签到 ,获得积分10
6分钟前
定烜完成签到 ,获得积分10
6分钟前
janer完成签到 ,获得积分10
6分钟前
开心每一天完成签到 ,获得积分10
6分钟前
高分求助中
Evolution 10000
Sustainability in Tides Chemistry 2800
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
叶剑英与华南分局档案史料 500
Foreign Policy of the French Second Empire: A Bibliography 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3146832
求助须知:如何正确求助?哪些是违规求助? 2798126
关于积分的说明 7826730
捐赠科研通 2454695
什么是DOI,文献DOI怎么找? 1306428
科研通“疑难数据库(出版商)”最低求助积分说明 627788
版权声明 601565