Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

规范化(社会学) 样本量测定 I类和II类错误 统计假设检验 瓦尔德试验 生物 假阳性悖论 计算生物学 多重比较问题 统计 复制 DNA微阵列 统计能力 基因表达谱 计算机科学 数学 基因表达 遗传学 基因 社会学 人类学
作者
Xiaohong Li,Nigel G. F. Cooper,Timothy E. O’Toole,Eric C. Rouchka
出处
期刊:BMC Genomics [Springer Nature]
卷期号:21 (1) 被引量:36
标识
DOI:10.1186/s12864-020-6502-7
摘要

High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths.Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size.We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Honghao发布了新的文献求助10
刚刚
852应助科研通管家采纳,获得10
1秒前
Akim应助科研通管家采纳,获得10
1秒前
共享精神应助科研通管家采纳,获得10
1秒前
huahua诀绝子完成签到,获得积分20
1秒前
良辰应助科研通管家采纳,获得10
1秒前
脑洞疼应助科研通管家采纳,获得10
1秒前
Xxanny应助科研通管家采纳,获得10
2秒前
Hello应助科研通管家采纳,获得10
2秒前
充电宝应助科研通管家采纳,获得10
2秒前
田様应助科研通管家采纳,获得10
2秒前
Hello应助科研通管家采纳,获得10
2秒前
深情安青应助科研通管家采纳,获得50
2秒前
天天快乐应助科研通管家采纳,获得10
2秒前
2秒前
2秒前
2秒前
完美世界应助756333725采纳,获得10
2秒前
校长发布了新的文献求助10
3秒前
ZRL发布了新的文献求助10
3秒前
陆拾荒完成签到,获得积分10
3秒前
Lucas应助Dain采纳,获得10
4秒前
所所应助olofmeister采纳,获得10
5秒前
zoromoon发布了新的文献求助200
5秒前
5秒前
6秒前
7秒前
科研通AI5应助木雷采纳,获得10
7秒前
李健应助Crush采纳,获得10
8秒前
8秒前
8秒前
8秒前
8秒前
10秒前
JamesPei应助美梦收藏家采纳,获得10
10秒前
10秒前
脑洞疼应助炙热冰蓝采纳,获得10
10秒前
11秒前
立追拓完成签到,获得积分20
11秒前
丘比特应助快乐的冬卉采纳,获得10
12秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Conference Record, IAS Annual Meeting 1977 820
England and the Discovery of America, 1481-1620 600
Teaching language in context (Third edition) by Derewianka, Beverly; Jones, Pauline 550
電気学会論文誌D(産業応用部門誌), 141 巻, 11 号 510
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3582610
求助须知:如何正确求助?哪些是违规求助? 3151872
关于积分的说明 9490514
捐赠科研通 2854134
什么是DOI,文献DOI怎么找? 1569071
邀请新用户注册赠送积分活动 734899
科研通“疑难数据库(出版商)”最低求助积分说明 720926