统计
数学
统计的
相关性
随机变量
比例(比率)
错误发现率
组合数学
随机变量
物理
几何学
生物化学
量子力学
基因
化学
标识
DOI:10.1198/jasa.2010.tm09129
摘要
We consider large-scale studies in which there are hundreds or thousands of correlated cases to investigate, each represented by its own normal variate, typically a z-value. A familiar example is provided by a microarray experiment comparing healthy with sick subjects’ expression levels for thousands of genes. This paper concerns the accuracy of summary statistics for the collection of normal variates, such as their empirical cdf or a false discovery rate statistic. It seems like we must estimate an N by N correlation matrix, N the number of cases, but our main result shows that this is not necessary: good accuracy approximations can be based on the root mean square correlation over all N ⋅ (N − 1)/2 pairs, a quantity often easily estimated. A second result shows that z-values closely follow normal distributions even under nonnull conditions, supporting application of the main theorem. Practical application of the theory is illustrated for a large leukemia microarray study.
科研通智能强力驱动
Strongly Powered by AbleSci AI